Tech giant recommends using AI for censorship

Yudi Sherman

August 16, 2023

OpenAI, the company behind popular chatbot ChatGPT, has published guidance on how to use artificial intelligence to streamline censorship.

The company explained in a blog post Tuesday how digital platforms can use GPT-4 to “moderate content.” GPT-4 is OpenAI’s newest large language model (LLM), the AI technology which powers ChatGPT.

“We believe this offers a more positive vision of the future of digital platforms, where AI can help moderate online traffic according to platform-specific policy and relieve the mental burden of a large number of human moderators,” said the company.

Its proposed method involves feeding the AI program a policy outlining which content should be suppressed. The program is then tested with examples and the prompts are adjusted as necessary.

Although OpenAI’s method would no longer require human “moderators,” the censorship guidelines would still be created by human censors, or “policy experts.”

“Once a policy guideline is written, policy experts can create a golden set of data by identifying a small number of examples and assigning them labels according to the policy,” wrote OpenAI.

Examples include a user asking the program where to buy ammunition or how to make a machete.

According to a graph provided by the company, AI is at least as effective as humans at censoring content in most categories: Sexual, Sexual/Illegal, Self-Harm/Intent, Self-Harm/Instruct, and Violence/Graphic. But for Hate, Hate/Threatening, Harassment, and Harassment/Threatening, AI appeared to censor significantly less than humans.

The AI company says the aim is “to leverage models to identify potentially harmful content given high-level descriptions of what is considered harmful.”

OpenAI is not the only tech giant determining which content is “harmful” and using AI to suppress it.

Microsoft’s Azure Content Safety also applies LLMs and image recognition algorithms to scour images and text for “harmful” content. The offending text or image will be placed into one of four categories: sexual, violent, self-harm, or hate, and will be assigned a severity score between one and six.

While part of Microsoft’s Azure product line, Content Safety is designed as standalone software which third-parties can use to police their own spaces such as gaming sites, social media platforms or chat fora. It understands 20 languages, along with the nuance and context used in each.

Microsoft assures users that the product was programmed by “fairness experts” who defined what constitutes “harmful content.”

“We have a team of linguistic and fairness experts that worked to define the guidelines taking into account cultural, language and context,” a Microsoft spokesperson told TechCrunch. “We then trained the AI models to reflect these guidelines. . . . AI will always make some mistakes, so for applications that require errors to be nearly non-existent we recommend using a human-in-the-loop to verify results.”

While concerns may be raised about a team of unknown individuals forcing their biases onto millions of users, the objections are not likely to come from within Microsoft. Over two months ago, the company fired its Ethics and Society team, which was tasked with monitoring the ethical build of Microsoft’s AI products.