The World Health Organization (WHO) has issued a stark warning against using artificial intelligence, especially large language models, without taking appropriate precautions about risks of bias and misdiagnosis. The warning comes as big tech companies are considering the health care benefits of their big language AI models. Google has published a version of its PaLM 2 model specifically for healthcare and OpenAI says that GPT-4, the model behind ChatGPT, has passed a series of medical exams.

Users are turning to AI to help with their diagnosis before visiting a doctor, and there are limited trials for using generative AI tools in therapy, the WHO says. But the problem is that if the data used to train the model lacks diversity, it can result in misdiagnosis or bias against certain groups. It can also lead to abuse if used by people who are not trained to understand the results.
The WHO says it is excited about the potential that a large language model has for supporting healthcare professionals, patients, researchers and scientists. This is especially the case when improving access to health information, as a decision-support tool and to improve diagnostic capacity – but warns that risks “must be carefully examined.”
“There is concern that caution that would normally be exercised for any new technology is not consistently exercised with LLMs,” the organization warned. “This includes widespread adherence to key values of transparency, inclusion, public engagement, expert oversight and rigorous evaluation.”
The speed of adoption is one of the main risks highlighted by the WHO. OpenAI’s ChatGPT was released in November last year and within four months became one of the fastest growing consumer applications in history. It has sparked a revolution in the tech industry with vendors rushing to incorporate generative AI tools into their software.
Google released a version of its new PaLM 2 large language model, known as MedPaLM 2 in April. The company said: “Industry-tailored LLMs, like Med-PaLM 2, are part of a growing family of generative AI technologies that have the potential to significantly improve healthcare experiences.”
Microsoft, a major investor in OpenAI, had put its research division GPT-4 to the test against a series of American medical exams. “Our results show that GPT-4, without specialized prompt crafting, outperforms the passing score by more than 20 points and outperforms previous general-purpose models as well as models specifically tailored to medical knowledge.”
This suggests, according to MSFT’s researchers, that there are “potential uses of GPT-4 in medical education, assessment and clinical practice”, adding that it should be done “with appropriate attention to challenges of accuracy and safety.”
Content from our partners



Healthcare LLMs require rigorous testing and assessment
But the WHO said: “Premature adoption of untested systems could lead to errors by healthcare workers, cause harm to patients, destroy confidence in AI and thereby undermine (or delay) the potential benefits and use of such technologies in the long term world. “
The main concerns are about the data used to train the model, especially about the risk that it may be biased and generate misleading or unnatural information. This can lead to health, equality and inclusiveness in care, the organization warned. It was also concerned about the fact that it often produces “hallucinations”, or inaccurate information that sounds legitimate to someone unfamiliar with the subject.
Other concerns include the use of training data not collected with appropriate consent, particularly sensitive health information and the potential to generate persuasive disinformation that may appear to be reliable health content. WHO advises policy makers to put patient safety and safeguards at the heart of any legislation on the use of LLMs. It also proposes that clear evidence be presented and measured before they are approved for widespread use in routine healthcare and medicine.
A study of the ethics of LLMs in medicine by Hanzhou Li of Emory University School of Medicine found that the use of the technology, regardless of the model or approach, raises “crucial ethical issues” about trust, bias, authorship, equality and privacy . “While it is undeniable that this technology has the power to revolutionize medicine and medical research, it is essential to consider its potential consequences,” Li wrote.
Published last month in medical journal The LancetLi said: “An outright ban on the use of this technology would be short-sighted. Instead, establishing guidelines aimed at using LLMS responsibly and effectively is crucial.”
Regulatory oversight likely for AI in healthcare
The final evaluation of whether AI in healthcare is safe will likely come down to regulators. Where there is a high risk, or if something is classified as a medical device, then it would normally have to go through a series of trials before it could be used in diagnosis or any aspect of healthcare.
In the United Kingdom, the Medicines and Healthcare products Regulatory Agency (MHRA) published a blog post in March about the potential of these models, and chatbots such as Bard or ChatGPT as a medical tool. It found that although a general purpose chatbot is not intended to be used for diagnosis it is unlikely to be a medical device. “However, LLMs that are developed for, or adapted, modified or aimed at specific medical purposes are likely to qualify as medical devices,” wrote Johan Ordish, head of software and AI at the MHRA.
However, it goes further than this, as even if it is not specifically designed or adapted for medical use, where a developer simply makes the claim that it “could” be used for medical purposes then it would likely qualify as a medical device. It is not clear if OpenAI’s boast that GPT-4 passed medical exams falls under this rule or if the claim about medical skills should be more explicit.
Ordish wrote that the regulation of LLMs, especially in the medical field, would be complex. Partly due to difficulties in documentation, but they would not be exempted from the requirements for medical devices if it was found that they were used and promoted as useful for health care.
“The MHRA remains open about how best to insure LLMs, but any medical device must have evidence that it is safe under normal conditions of use and performs as intended, as well as meeting other applicable medical device regulatory requirements,” he explained . “We are committed to working with all our stakeholders to find solutions where possible, and to communicate our regulatory updates to developers.”
Post a Comment