Study says 50% of medical advice from the five major AI platforms is “problematic”

On April 15, Bloomberg reported that a new study shows that AI-driven chatbots provide answers to problematic questions about half of the time when providing medical advice. The discovery highlights the health risks of AI, a new technology that is increasingly integrated into daily life.

Researchers from the United States, Canada, and the United Kingdom evaluated five leading AI platforms: ChatGPT, Gemini, Meta AI, Grok, and DeepSeek, by asking each platform 10 questions under five health categories. According to research published this week in the medical journal BMJ Open, about 50% of all responses from these AI chatbots were considered "problematic," with nearly 20% classified as "highly questionable."

The study found that these chatbots performed significantly differently on different types of questions: they performed relatively better on closed-ended questions (answers are certain) and questions related to vaccines and cancer, but performed worse on open-ended questions and areas such as stem cell research and nutrition.

Yellow means there is a problem, orange means there is a height problem

The researchers said the responses were often given in a confident and certain tone, but none of the chatbots provided a complete and accurate reference list in response to any prompt. Throughout the study, the chatbot declined to answer questions only twice, both from Meta AI.

The findings highlight a growing concern that people are increasingly using generative AI platforms to obtain medical advice, but these platforms are not licensed to provide medical advice and lack the clinical judgment required to make a diagnosis.

The explosion of AI chatbots has made them a popular tool for people seeking guidance about their disease. OpenAI says that more than 200 million people turn to ChatGPT every week for health and wellness questions. The platform announced in January this year that it would launch separate health tools for general users and clinicians. In the same month, Anthropic also announced that its Claude product would launch a new healthcare service.

The authors of the study, published in BMJ Open, said a significant risk if chatbots are deployed without public education and regulation is that they could amplify the spread of misinformation.

They said the findings "highlight important behavioral limitations of AI and suggest the need to re-evaluate how AI chatbots are deployed in public-facing health and medical communications." They also note that these systems are often able to generate "authoritative-sounding but potentially flawed responses."