Wikipedia editors have just introduced a new policy to help them cope with the influx of AI-generated articles that are flooding the online encyclopedia. The new policy gives administrators the ability to quickly remove AI-generated articles that meet certain criteria. Not only is this crucial for Wikipedia, but it also provides the platform with an important example of how to deal with the growing chaos caused by artificial intelligence.

Wikipedia is maintained by a collaborative global community of volunteer contributors and editors, and part of what makes it a reliable source of information is that this community spends a lot of time discussing, deliberating, and debating everything that happens on the platform, whether it's modifications to individual articles or the policies that govern those modifications. It is normal to delete an entire Wikipedia article, but the main deletion process usually involves a week-long discussion period, during which Wikipedia users try to reach a consensus on whether to delete the article. 

However, to deal with common issues that clearly violate Wikipedia policies, Wikipedia also has a "quick removal" process, where a person flags an article, an administrator checks whether it meets certain conditions, and then deletes the article without a discussion period. 

For example, entries that consist entirely of hallucinatory gibberish, meaningless text, or what Wikipedia calls "nonsense" can be flagged for quick deletion. The same goes for entries that are merely advertisements and have no encyclopedic value. If someone is marking an article for deletion because it is "probably not noteworthy," that is a more subjective assessment that requires full discussion. 

Currently, articles marked as AI-generated by Wikipedia editors mostly fall into the latter category because editors are not entirely sure whether they were generated by AI. Ilyas Lebleu is a founding member of the Wikipedia AI Cleanup Project and an editor who contributed some critical language to the recently adopted policy on AI-generated articles and rapid deletion. This, he told me, is why previous proposals to regulate Wikipedia’s AI-generated articles have struggled. 

“While it’s easy to tell that something is AI-generated (e.g. word choice, dashes, bulleted lists with bold headings, etc.), the signs are often not that obvious, and we don’t want to mistakenly remove content just because it sounds like AI,” LeBrew told me in an email. "Overall, the rise of easily generated AI content has been described as an 'existential threat' to Wikipedia: since our processes are geared toward (often lengthy) discussion and consensus building, the ability to quickly generate large amounts of false content is problematic if we don't have a way to quickly remove it. Of course, AI content is not unique, and humans are perfectly capable of writing bad content, but certainly not at the same speed. Our tools are designed for a completely different scale."

The solution proposed by the Wikipedians is to allow rapid deletion of articles that are clearly generated by artificial intelligence and that meet roughly two conditions. First, the article contains content “designed to communicate with users.” This refers to language in the article that is clearly a large language model (LLM) responding to user prompts, such as "This is your Wikipedia article about...", "As of my last training update..." and "As a large language model." This clearly indicates that the article was generated by a large language model, an approach we have also previously used to identify AI-generated social media posts and scientific papers. 

Lebleu told me they've seen these situations "many times," and more importantly, he said, they indicate users didn't even read the articles they submitted. 

"If users don't check these basic things, we can safely assume that they didn't check anything they copied and pasted, and it's as useless as white noise," they said.

Another condition that causes AI-generated articles to be quickly removed is if their references are obviously wrong, which is another mistake that large language models (LLMs) are prone to making. This may include including external links to books, articles or scientific papers that do not exist and cannot be parsed, or links to completely unrelated content. Wikipedia's new policy gives one example: "A paper on a beetle species is cited in a computer science article."

Lebleu said rapid removal is a "stop-gap measure" that addresses the most obvious problems, and that AI problems will persist as more and more AI-generated content doesn't meet the new conditions for rapid removal. They also noted that AI could be a useful tool that could bring positive force to Wikipedia in the future. 

"However, the current situation is very different, and speculation about where technology will go in the next few years can easily distract us from solving current problems," they said. "A key pillar of Wikipedia is that we have no set rules, and any decisions we make today may be revisited in a few years as technology evolves."

LeBrew said the new policy will ultimately leave Wikipedia in a better position than before, but it's not perfect.

The good news (besides the quick deletion) is that we have officially issued a statement regarding articles generated by large language models. This has been a point of contention in the community: while the vast majority oppose AI content, exactly how to deal with it has been a point of contention, and early attempts to develop broad policy failed. Here, based on previous progress on AI images, drafts, and discussion comments, we discussed a more specific standard, but one that explicitly states that unmoderated large language model content is spiritually incompatible with Wikipedia.

Related articles:

Wikipedia suspends AI summarization pilot after editors protest