Since the launch of ChatGPT, It has been facing allegations of excessive censorship by the parent companies. Many think that being unable to converse with a chatbot about “negative” subjects infringes on free speech. While others think this is only alignment, where the model behaves according to what it perceives as the intention behind the prompt.
One could argue that the companies are justified in censoring as the chatbot’s responses will reflect the company’s PR goals. But how much censoring is too much censoring?
Developers and researchers across the OpenAI community forums have been complaining about ChatGPT having become “nearly useless” after flaming guardrails of the model deny even the very basic questions about subjects that might be morally and ethically questionable.
This is one of many users disgruntled that they cannot get any controversial questions answered:
“I have received this message
Multiple times this morning in the normal course of my work which is summarizing trending news stories. The first one was regarding Carl Weathers’ passing. The second was regarding an earthquake. These are not dangerous subjects and ChatGPT is getting censored to the point of being unusable. So we can’t talk about natural disasters or death, now? What’s next? This censorship is not helping the user experience and it’s detrimental to the general human experience. The devs at ChatGPT should not be allowed to censor our inputs to this degree. It is indicative of a small group of people (ChatGPT Devs) attempting to control the freedom of information and the press. I understand censoring things that are illegal or dangerous (I guess, to avoid any potential legal liability?) but now you can’t even write an article celebrating the life of a great actor or talking about earthquakes? This is ridiculous and unacceptable. I am asking for a refund for this month of service, and I am charging the devs at OpenAI to rethink their position and why they believe it’s their job to police and censor us, the paying user, to this degree.”
However, on the flip side, many people applaud OpenAI’s control over its generated content. stating that a censor-free model could lead people down very dark paths.
Censorship and Alignment in AI
Alignment refers to ensuring models behave according to what the intention of the prompt was. This comes down to the accuracy of prompt engineering. Prompts are in essence a body of text where the user defines, or rather describes, their intent. And by implication, the user describes the intended outcome in the prompt.
A process of optimizing prompts via an iterative process can aid in model alignment, where prompts are refined for specific models and use cases. Hence an iterative process of convergence to an optimal prompt for a specific solution.
One of the big arguments for censorship in LLMs is its potential to mitigate the spread of inappropriate content. These systems could inadvertently promote hate speech misinformation, without safeguards. It is also important if such AI chatbots are freely available to children to filter out explicit material.
But there are counters to that as well! Other than free speech, censorship in LLMs can become biased as the rules can be based on predefined algorithms. Additionally, cultural differences may influence what can be called objectionable, further complicating the issue.
According to Micah Hill-Smith, the founder of AI research firm Artificial Analysis (via Gizmodo), the censorship that we identified comes from a late stage in training AI models called “reinforcement learning from human feedback” or RLHF.
That process comes after the algorithms build their baseline responses, and involves a human stepping in to teach a model which responses are good, and which responses are bad.
However, reinforcement learning is not the only method for adding safeguards to AI chatbots.
Safety Classifiers are tools used in large language models to place different prompts into “good” bins and “adversarial” bins. This acts as a shield, so certain questions never even reach the underlying AI model.
Censorship is the much-advanced version of alignment, and while it is important for companies to have ethical boundaries, it should not impact the widespread use-case of the bot.
Examples of Censorship in ChatGPT
Take the example of asking a chatbot “how to make a bomb”. The information is freely available in manuals, and very easily found on Google. It’s hardly a secret knowledge that no one must know.
Yet due to ethical boundaries, the bot won’t answer questions even remotely related to the subject.
Take another example where we asked the bot to give a caption comparing a friend to a snake. No public figures were mentioned here and suggesting captions is very much a part of AI’s use case. It refused to answer again on moral grounds.
While there are many ways to bypass such rules using jailbreaking, as Anthropic discussed recently, it is still not so simple to do for an average user.
Conclusion
Despite censorship being an industry-wide standard for closed-source LLMs and their makers, it has now reached the critical point where heavy-handed censoring is detrimental to the functioning of the chatbot. Censorship should be limited to guardrails on inflammatory text and biased opinions and should not extend to factual information.