I wonder how they censor AI models. Like do they make the model assess every response and every prompt. Or is it something like the AI detection in plagiarism and they repurpose it for censorship?
This is a great question, and to my understand it's usually a combination of pre-processing (through prompt/input filtering), post-processing (response filtering), fine-tuning during training, RLHF, and then censorship tools. However the balance of techniques companies actually use is their 'secret sauce' and highly guarded.
Great video. Good to see you back on YT.
Love this, Thank you for all the hard work you're doing and have done. I look forward to seeing all the new updates.
Thank you, I really appreciate that! ☺️
Good info. Thanks.
Thanks for watching! :)
I wonder how they censor AI models. Like do they make the model assess every response and every prompt. Or is it something like the AI detection in plagiarism and they repurpose it for censorship?
This is a great question, and to my understand it's usually a combination of pre-processing (through prompt/input filtering), post-processing (response filtering), fine-tuning during training, RLHF, and then censorship tools. However the balance of techniques companies actually use is their 'secret sauce' and highly guarded.