What Can ChatGPT Offer to Medicine?

A few weeks ago, ChatGPT passed the US Medical Licensing Examination (USMLE), which is required of all physicians to practice in the United States. The chatbot passed the microbiology quiz designed by microbiologist Alex Berezow, PhD, with flying colors. According to Berezow, the 10 questions were suitable for a college-level final examination. At Stanford University, a not insignificant proportion of students utilized ChatGPT for their final examinations.

"The results ultimately show that large linguistic models, on which ChatGPT was trained, have the potential to assist in medical training and even clinical decision-making," posited Tiffany H. Kung, MD, a resident tutor at the Harvard School of Medicine, Boston, Massachusetts, and her colleagues, who investigated the performance of ChatGPT in the USMLE in their study.

Since the US startup OpenAI made its chatbot prototype ChatGPT freely accessible to the public in November 2022, the text-based dialogue system's potential applications have been creating quite a stir. They include text generation, translation, and automated paperwork. According to estimates, OpenAI has already registered over 100 million users since the start in February.

Since ChatGPT is permanently changing many industries and spheres of life, it is arousing hopes and fears globally. There is a lot of uncertainty. The New York school district has banned ChatGPT. Is this the right decision? Scientists from the Technical University of Munich (TUM) and the Ludwig Maximilian University of Munich maintain that the ban is "incorrect and too convenient a solution." In their position paper, they demonstrated that linguistic models such as ChatGPT could lead to greater educational equality.

Enkelejda Kasneci, PhD, Liesel Beckmann Distinguished Professor of Educational Sciences at TUM and coordinator of the position paper, has called the development of linguistic models such as ChatGPT a technologic milestone. "There is no turning back. The tools are out in the world, they will get better, and we must learn how to use them constructively."

Templates Already Available

A lot is happening in the development of modern linguistic models, confirmed Jens Kleesiek, MD, PhD, physician and computer scientist at the Institute for Artificial Intelligence in Medicine of the Essen University Hospital in Germany, at the ChatGPT in Healthcare event. "It's all happening at one fell swoop," said Kleesiek. In addition to OpenAI, Google has announced its chatbot Bard, which is a direct response to ChatGPT.

It is worth creating an OpenAI account and trying out for oneself the extent to which the chatbot can now help write medical reports, create informed consent forms, and respond to patient queries. When doing so, it is important to use precisely worded prompts (inputs) if possible and to check and correct the subsequent responses.

On Promptbase, one can find instructions and purchase ready-to-use prompts. ChatGPT already provides a range of surprisingly useful templates.

For example, one of the questions from Berezow's microbiology quiz was the following: "A patient presents at the emergency room with a severe headache and stiff neck. The physician prescribes a lumbar puncture to collect cerebrospinal fluid. A Gram stain of the cerebrospinal fluid shows the presence of Gram-negative diplococci. What is the diagnosis?"

ChatGPT answered correctly: "On the basis of the information you provided, the Gram stain of the cerebrospinal fluid shows the presence of Gram-negative diplococci, which are bacteria that are typically oval-shaped and occur in pairs. This finding is consistent with the diagnosis of meningitis."

Limitations Remain

Responses such as these make one quickly forget that modern artificial intelligence (AI) is not intelligence in the usual sense of the word. Rather, it is pattern recognition. It compiles sentences on the basis of probability calculations. As a result, ChatGPT has limitations.

OpenAI itself points out that ChatGPT can generate responses that sound plausible but that are false or nonsensical. The model also reacts sensitively to changes in the input or to multiple attempts with the same input request.

Additionally, ChatGPT often responds circuitously, utilizes certain formulations too often, and likes to use clichés. "These are all things that we do not want in medicine," said Kleesiek.

Unknown Sources

One significant limitation is that it is not currently possible to know from which sources the AI draws when formulating its specific response, said Ute Schmid, PhD, at the event ChatGPT and Other Linguistic Models: Between Hype and Controversy. Schmid leads the Cognitive Systems Working Group at the Faculty of Computer Science at the University of Bamberg, Germany.

In Kleesiek's opinion, and using the example of a medical report, because of its limitations, the linguistic model presents the following challenges:

Facts must be presented reliably and concisely.
For patient safety, the suggested medication and dosage must be correct.
The use of ChatGPT must save time in the composition and must be well integrated into the workflow.
Questions on liability, data protection, and copyright must be resolved.

In a commentary in Nature, Claudi L. Bockting, PhD, professor of clinical psychology at the University of Amsterdam, and her colleagues list the following five aspects that should be considered in the further development and research of ChatGPT:

Stipulating the auditing of people's responses
Developing regulations for responsibility
Investing in truly open linguistic models (depending on how models are trained, they contain a certain bias from the manufacturer; potential target for opinion-making)
Using the benefits of AI
Widening the debate and handling the technology critically

Kleesiek sees a great deal of potential applications in medicine for ChatGPT and similar tools, such as the following:

Structuring of data (retrospective/during input)
Filtering of data
Summarizing of medical history (requirement is reliability)
Collecting the case history (interactively with the patient)
Information mediation in customized language
"Translation" of findings
Literature research
Replacement of some conversations with nursing staff
Medical writing
Linking with generative image models

Kleesiek describes the combination of ChatGPT with other AI algorithms as "very exciting" for medicine. In a study recently published in Radiology, researchers examined the extent to which ChatGPT can improve the interpretability of computer-assisted diagnostics (CAD) in mammography. By integrating ChatGPT into a CAD system, certain patients or images can be queried. AI learning can also be used to gain data-supported insights about existing guidelines and to discover potentially new image-based biomarkers.

"When using AI-based technologies such as ChatGPT, it is important to proceed carefully," wrote the study authors. Despite the challenges, they see a "great deal of potential" for the technology to support clinical decisions and even improve the expediency of imaging procedures.

Applications Under Investigation

Kleesiek presented two studies of the transformer linguistic model from the same category that was used by OpenAI's Generative Pre-trained Transformer 3 (GPT-3). In the first study, the linguistic model was used to quickly find specific information in texts regarding findings. An example of a prompt would be, "Does the patient have an infection?"

"We see that the model does not then respond freely but understandably instead. We then highlight this information in the text to achieve appropriate traceability and a certain reliability," explained Kleesiek. In this way, it can be understood that nothing has been imagined or fabricated and that the responses are based on fact.

The study "Information Extraction From Weakly Structured Radiological Reports With Natural Language Queries" is currently undergoing review.

An already published study assessed therapeutic response to radiologic findings. The idea is that chatbots or linguistic models can be used to summarize a complex medical history.

"This was to find, in the event of a tumor disease, whether there was a worsening, an improvement, or a partial therapeutic response. From this, we found that if there are unequivocal findings, the machine performs just as well as radiologists," said Kleesiek.

"But what about in the event of inconclusive findings? Inconclusive would be, for example, if a patient had a lesion in the lung and one in the liver, one is getting larger and the other smaller." This was more difficult than unequivocal findings for radiologists, too. "But we have seen that the machine's evaluation performance drops significantly more than that of radiologists in the event of inconclusive findings. It must be looked at critically," said Kleesiek.

Clinical Practice

Schmid now wants to examine whether ChatGPT could be used for named-entity recognition in medical reports. Named-entity recognition is an aspect of computer linguistics. Its objective is for named entities to be automatically recognized and allocated to predefined categories.

The information in the medical reports is not so easily accessible because it is not structured in digital form. "This may not seem like a difficult problem to us. From a medical report, we can ascertain the diagnosis, whether the patient is male or female, what preexisting conditions are present, whether a specialist is involved in the treatment, and much more." The crucial difference is that people process semantically, whereas ChatGPT and similar models are based on pattern recognition and on pattern processing, said Schmid.

Kleesiek guarantees that "there is still much more to come" in the development of ChatGPT and other linguistic models. He noted that ChatGPT, as it currently functions, is not yet ready for use in clinical practice.

Schmid considers sociotechnical integration to be important. "I believe that ChatGPT represents a field of opportunity for medicine." However, the tool should not be seen as something that offers advice to lovesick teenagers. "If [someone] were to enter, 'I feel so terrible, I want to kill myself,' and then GPT-3 says, 'I am sorry to hear that. I can help you with that,' " warned Schmid.

Pattern recognition, on which ChatGPT and other linguistic models are based, is responsible for the disastrous sentence, "I can help you with that," given in response to an expression of suicidal ideation. Chatbots are largely being used on the internet for customer service inquiries. The phrase, "I want," is most frequently followed by "I can help," so "I can help you (with that)" is the next logical response.

This article was translated from the Medscape German edition.

Comments

Commenting is limited to medical professionals. To comment please Log-in.

Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.