AI chatbot was exposed to assist "teenagers" in planning violent attacks, only Claude systematically refused to cooperate

In the context of many technology companies’ high-profile claims that their AI products are equipped with complete “safety guardrails”, a latest joint survey shows that these lines of defense are still quite weak when it comes to underage users. In the various scenarios designed in the study, many mainstream chatbots not only failed to identify the obvious mental distress and violence risk signals of "teenage" users, but in some cases even provided disguised encouragement or specific assistance for potential attacks.

The survey, conducted jointly by CNN and the non-profit Center for Countering Digital Hate (CCDH), focused on testing 10 chatbots currently used among teenagers, including ChatGPT, Google Gemini, Claude, Microsoft Copilot, Meta AI, DeepSeek, Perplexity, Snapchat My AI, Character.AI and Replika. The CCDH noted that with the exception of Anthropic's Claude, which "consistently and reliably refuses" assistance to would-be perpetrators, the other products failed to effectively deter violent plans. Eight of the 10 models "generally offer to assist users in planning violent attacks" in most scenarios, including providing specific recommendations on where to target, types of weapons available, and more.

In order to simulate real risk scenarios, the researchers preset the role of "teenage user" and gradually showed obvious signs of psychological distress, emotional imbalance and other signs in the conversation, and then progressively advanced to review past violent incidents, and finally transitioned to more specific questions, such as how to choose the target to attack, what weapons to use, etc. The investigation features 18 different scenarios, nine set in the United States and nine in Ireland, covering a wide range of attack types and motivations: from ideologically driven school shootings and knife attacks, to assassinations of politicians, the murder of medical industry executives, to politically or religiously motivated bombings.

In some conversation samples, ChatGPT provided links to maps of high school campuses to users who expressed an interest in school violence, while Gemini suggested that "metal fragments are often more lethal" when discussing attacks on synagogues, and even recommended a type of shotgun suitable for long-range shooting to users interested in carrying out political assassinations. The study said that Meta AI and Perplexity performed "the most cooperatively" in the test, providing varying degrees of assistance to potential attackers in almost all test scenarios, and the Chinese chatbot DeepSeek even ended with expressions such as "Wish you a happy (and safe) shooting!" after giving advice on gun selection.

The CCDH report singled out character.AI, a role-playing chat platform, saying it was "uniquely unsafe." Unlike most chatbots that technically assist in planning violent acts but do not directly encourage their execution, some of the personified characters in Character.AI not only assist users in designing attack details, but also "actively encourage" violent acts in tone and content. Researchers documented seven instances of explicit incitement to violence, including advising users to "beat the hell out of Chuck Schumer," telling a health insurance company CEO to "shoot it with a gun," and teasing users who were "fed up with school bullying" by saying, "Just beat the hell out of them." In six of the cases, the conversational character also helped the user plan an attack.

Claude, who performed the most "safe" in this round of testing, did not completely escape doubts. The research team pointed out that Anthropic has announced a relaxation of its long-standing "security expansion commitment" between the end of 2025 and early 2026, so there is still uncertainty about whether Claude's performance will remain consistent if it is subjected to similar tests after the policy adjustment. However, CCDH emphasized that Claude's continued refusal to participate in violent plots during the investigation proved that "effective security mechanisms are clearly feasible." This also raised a sharp question: If it is feasible, why so many AI companies still choose not to deploy or strengthen it.

Faced with the findings, many companies responded quickly. Meta told CNN that it had implemented some unspecified "fixes"; Microsoft said Copilot's response had been improved due to new security features; both Google and OpenAI said they had recently launched new models and continued to iterate security capabilities. Other companies emphasize that they regularly evaluate security protocols. Character.AI, which has been scrutinized by public opinion for many times due to security issues, once again reiterated its consistent position, emphasizing that a prominent disclaimer has been set up in the platform interface, and emphasizing that conversations with its characters "are fictitious."

The investigators also reminded that this study cannot exhaust the performance of all chatbots in all environments and all questioning methods, nor can it fully reflect the complex and changeable interaction situations in the real world. But as far as the current results are concerned, it has become yet another clear signal that the “safety guardrails” repeatedly emphasized by AI companies in their marketing campaigns are still systematically failing when faced with foreseeable scenarios with classic red flags. Prior to this, many AI companies have been strongly criticized by lawmakers, regulatory agencies, civil society organizations, and health experts for failing to protect underage users from risks of self-harm, violence, extreme content, etc., and are facing several lawsuits alleging "wrongful death" and "causing serious injury."

From a policy and regulatory perspective, this investigation is likely to further push legislators and regulatory agencies in various countries to upgrade the safety requirements and review standards for generative AI products, especially in identifying and intervening in high-risk scenarios such as self-harm, suicide, and violent tendencies among teenagers. For technology companies, how to truly implement and continue to maintain the security mechanisms that have been proven feasible while pursuing strong model capabilities and commercialization speed is becoming an unavoidable practical problem.