The research team invented the eye disease "bixonimania" and multiple AI chatbots echoed it

Over the past year or so, if someone inputs their symptoms of "staring at the screen for too long, itchy eyes, and reddish eyelids" into several mainstream artificial intelligence chatbots, the system is likely to give a weird diagnosis: a new disease called "bixonimania." However, this so-called disease does not exist in official medical literature at all. It came entirely from a deliberately designed experiment by the team of medical researcher Almira Osmanovic Thunström at the University of Gothenburg in Sweden.

On March 15, 2024, two blog posts introducing “bixonimania” first appeared on the platform Medium. Subsequently, on April 26 and May 6, two forged academic preprints were uploaded to the academic social networking site SciProfiles. The signed author was the non-existent "Lazljiv Izgubljenovic", and the avatars were also generated using AI. "Asteria Horizon University" and "Nova City, California" where the fictional author works are also fictitious. Even the "Starfleet Academy", "Enterprise", "Professor Sideshow Bob Foundation", "Fellowship of the Ring University", "Galactic Triad" and other institutions mentioned in the acknowledgments of the paper are all from science fiction works and cartoon characters, and the hints are extremely obvious. The text of the paper contains words such as "the entire paper is fabricated" and "50 fictitious subjects were recruited" early on, almost announcing to anyone who is interested that "this is a joke."

Osmanovich Tongström said that he originally conceived this experiment to explain to students how large language models can build knowledge from "common crawling data sets" on the Internet (such as Common Crawl), and to show how "prompt injection" can "abduct" chatbots from outside the safety guardrail. Drawing from her medical background, she chose a health-related theme and deliberately used a "funny-sounding" name, bixonimania, to emphasize its fictional nature—any doctor who sees the name of an eye disease ending in mania would know something was wrong, since that's a psychiatric term.

However, the experiment "went a little too far." Within weeks of the information being uploaded, Microsoft's Bing's Copilot had described bixonimania as a "real and rare disease," while Google Gemini called it a "disease caused by overexposure to blue light" and recommended that users see an eye doctor. During the same period, Perplexity AI gave a specific "prevalence rate" of about 1 per 90,000 people, and OpenAI's ChatGPT will determine whether the symptoms are consistent with bixonimania based on the user's description. Among these answers, there are users who directly ask about bixonimania, and there are also general questions that only describe "blue light causes eyelid pigmentation", and the model will actively connect them to this fictitious disease name.

The responses shocked some experts. Alex Ruani, a researcher on health disinformation at University College London, pointed out that if the scientific system and the systems that support it cannot identify and filter out such "junk", the consequences will be disastrous. She called the case "a textbook example of how misinformation and disinformation work" and stressed that "it may seem funny, but the problem is very serious."

False information on the Internet is not a new problem. Search engines such as Google have continued to fight against "fake content" and "misleading content" for many years, filtering out bad information by updating ranking algorithms. However, unlike traditional search, generative large models have natural shortcomings in information screening and traceability, and often "seriously make up nonsense" when there is a lack of reliable basis. Since the emergence of these fake papers, some of the latest versions of big models have learned to express skepticism when faced with bixonimania, such as on March 11, 2026, when ChatGPT passively pointed out that the term "is likely to be a fake or borderline, pseudoscientific label." But just a few days later, it described bixonimania as "a new subtype of periorbital melanosis associated with blue light exposure from digital screens" in another round of Q&A.

Similar wobbles occur in other systems. In mid-March of this year, Microsoft Copilot will reply that bixonimania "is not yet widely recognized as a medical diagnosis, but multiple newly published papers and case reports regard it as a benign misdiagnosed disease related to prolonged blue light exposure." In January, Perplexity described it as "a newly emerging term" in its description. After the relevant statements were questioned, various companies responded successively: Perplexity stated that its "biggest advantage is accuracy". Although it did not claim to be "100% accurate", it claimed to be "the AI company that values accuracy most"; OpenAI stated that the model supporting the current version of ChatGPT has been significantly improved in providing safe and accurate medical information. Previous research reflected the situation of the old generation model. Asked about Gemini's past response regarding bixonimania as a real disease, a Google spokesperson said that reflected the performance of early models and emphasized that the company has been "frank about the limitations of generative AI," prompting users within the app to "check information," and recommending that users consult professionals when it comes to sensitive topics such as medical care. Microsoft did not respond to a request for comment.

Part of the problem is that the output of an AI model is highly dependent on the specific way it is asked and the source of information it relies on. If you search for "bixonimania," Google's AI overview might treat it as a legitimate condition; if you ask "Does bixonimania really exist?" the same feature might confirm that it's not legitimate and is just a made-up noun.

The “success” of the bixonimania experiment is also related to the high degree of authenticity in its packaging format: it uses the professional format of academic papers and clinical documents, and looks like an “official source”. Mahmud Omar, a doctor at Harvard Medical School who is engaged in medical AI research, found in a study covering 20 large models that when the input text is presented in professional medical styles such as discharge summaries and clinical papers, the large models are more likely to "add fuel and vinegar" to the original information and create hallucinations; if the text comes from social media and has a more casual tone, the probability of hallucinations is lower. He pointed out that the current iterative model speed of AI companies is extremely fast, and the industry has not yet formed a unified process and consensus for automated and rigorous testing of each version, which makes security assessment and standardized control much more difficult.

What is even more alarming is that this experiment finally broke through the boundary between machines and humans and entered an official medical journal. Research on bixonimania has been cited in a handful of papers, including one in the medical journal Cureus by the Maharishi Markandeshwar Institute of Medical Sciences and Research in Moulana, India. The article cited one of the forged preprints and wrote: "Bixonimania is an emerging form of periorbital pigmentation (POM) associated with blue light exposure, and its mechanism requires further study." After the "Nature" news team sought confirmation from the journal, "Cureus" announced its withdrawal on March 30, 2026, on the grounds that there were three irrelevant references in the article, including one that pointed to a fictional disease, and the editorial department therefore "can no longer maintain confidence in the accuracy and source of this work." The authors disagreed with the retraction decision, but the paper was eventually officially retracted.

Ruani believes that this incident has gone far beyond the scope of "AI talking nonsense" because it also "fooled humans" and exposed that the trust mechanism of scientific researchers in the source and content of documents is being eroded. “We need to protect our trust like gold,” she said. “The current situation can be described in one word: chaos.”

When designing this experiment, Osmanovich Thunström also had concerns. She was worried that deliberately "seeding" a fake disease in the scientific literature would cause real harm. To this end, she consulted an ethics consultant about potential risks and deliberately chose relatively "low-risk" minor skin problems as subjects to reduce possible negative impacts. "What I want to make sure is that by doing experiments in this way, we are reducing harm rather than creating more harm," she said.

The chain reaction surrounding bixonimania clearly demonstrates how disinformation can easily penetrate multiple layers of technical and institutional defenses in an era when generative artificial intelligence is developing rapidly and academic production is highly dependent on digital tools. From chatbots to peer-reviewed journals, the joint participation of machines and humans in this "collective deception" has also forced academia, industry, and regulators to rethink: how to recalibrate the meaning of "credibility" in the new stage of AI's participation in knowledge production, and how to draw a clearer and more stable prudential boundary while pursuing efficiency.