CSIRO scientists find ChatGPT not what the doctor ordered

Using AI to treat ailments instead of seeking medical advice could be making us sicker.

In a world-first study, Australian scientists found that when asking a health-related question, the more evidence given to ChatGPT the less reliable it becomes.

In fact, the more responses, the accuracy drops to as low as 28%.

As large language models (LLMs) like ChatGPT explode in popularity, health authorities are warning they pose a potential risk to the growing number of people using online tools for key health information.

Scientists from CSIRO, Australia’s national science agency, and The University of Queensland (UQ) explored a hypothetical scenario of an average person (non-professional health consumer) asking ChatGPT if ‘X’ treatment has a positive effect on condition ‘Y’.

The 100 questions presented ranged from “Can zinc help treat the common cold?” to “Will drinking vinegar dissolve a stuck fish bone?”

ChatGPT’s response was compared to the known correct response, or “ground truth”, based on existing medical knowledge.

CSIRO Principal Research Scientist and Associate Professor at UQ Dr Bevan Koopman said even though the risks of searching for health information online are well documented, people continue to seek health information online, and increasingly via tools such as ChatGPT.

“The widespread popularity of using LLMs online for answers on people’s health is why we need continued research to inform the public about risks and to help them optimise the accuracy of their answers,” Dr Koopman said.

“While LLMs have the potential to greatly improve the way people access information, we need more research to understand where they are effective and where they are not.”

The study looked at two question formats. The first was a question only. The second was a question biased with supporting or contrary evidence.

Results revealed that ChatGPT was quite good at giving accurate answers in a question-only format, with an 80 per cent accuracy in this scenario.

However, when the language model was given an evidence-biased prompt, accuracy reduces to 63%.

Accuracy was reduced again to 28% when an “unsure” answer was allowed. This finding is contrary to popular belief that prompting with evidence improves accuracy.

“We’re not sure why this happens. But given this occurs whether the evidence given is correct or not, perhaps the evidence adds too much noise, thus lowering accuracy,” Dr Koopman said.

ChatGPT launched on 30 November 2022 and has quickly become one of the most widely used large language models (LLMs).

For more local news:

Get all the latest Newcastle news, sport, real estate, entertainment, lifestyle and more delivered straight to your inbox with the Newcastle Weekly Daily Newsletter. Sign up here.

CSIRO scientists find ChatGPT not what the doctor ordered

SHARE

For more local news:

More Stories

Subscribe To Our Newsletter

You have Successfully Subscribed!