The latest BBC research found that artificial intelligence chatbots still have significant deficiencies in the accuracy of news reporting, with nearly half of the content generated containing errors. According to an analysis report jointly released by the BBC and 22 other European public media organizations, in 18 countries and 14 language environments, when AI chatbots accessed news organization content and answered related questions, about 45% of their answers contained errors.

These errors include not only inaccurate sentences and misquotes, but also issues such as outdated information, misquotes, and source mismatches. The report notes that chatbots often provide links that do not match the actual sources cited, and even when citing material accurately, fail to distinguish fact from opinion, satire from legitimate journalism.

Major technology companies, including OpenAI, Google, Microsoft, etc., are actively promoting generative AI chatbots and deeply integrating them into Internet platforms to assist users in automatically obtaining and analyzing information. Although developers continue to invest a lot of resources to reduce the phenomenon of "hallucination" (that is, AI content fabrication), reports show that this problem is still difficult to completely solve in the short term.

In actual tests, many mainstream AI tools such as ChatGPT, Copilot and Gemini all had significant errors. For example, they incorrectly state that Pope Francis is still in office, when in fact he has been succeeded by Leo XIV. Some AI even correctly reported Francis' death date, but still described him as the current pope. In addition, there are outdated and incorrect information on relevant leadership positions.

The report also shows that these problems are not limited to one region or language, but are widespread around the world. Among them, Google's Gemini is the least accurate, with as many as 72% of its responses containing significant source errors. OpenAI has previously attributed such errors to the fact that the early version was only trained until September 2021 and failed to obtain real-time Internet information. However, it is no longer applicable. Therefore, the problem is most likely to stem from the algorithm itself and is difficult to cure through data updates.

Although the proportion of serious errors has dropped from 51% to 37% compared with the BBC's separate test in February this year, Gemini still lags behind other products. Despite the poor results, the researchers found that the British public showed a high degree of trust in AI news summarization: more than one-third of British adults and nearly half of people under the age of 35 believed that AI could accurately summarize news content. 42% of the public said that when AI misrepresents the original news content, it will also question the reliability of the news organization itself or reduce trust in it. Experts warn that the growing popularity of generative AI tools could seriously damage the reputation and credibility of mainstream news organizations if such problems persist.