ChatGPT, the AI chatbot everyone is talking about, can often give reliable answers to questions about breast cancer, a new study finds. But it’s not yet ready to replace your physician.
The big caveat, researchers said, is that the information is not always trustworthy, or offers only a small part of the story. So at least for now, they said, take your medical questions to your human doctor.
ChatGPT is a chatbot driven by artificial intelligence technology that allows it to have human-like conversations — instantly generating responses to just about any prompt a person can cook up. Those responses are based on the chatbot’s “pre-training” with a massive amount of data, including information gathered from the internet.
The technology was launched last November, and within two months it had a record-setting 100 million monthly users, according to a report from the investment bank UBS.
ChatGPT has also made headlines by reportedly acing the college SATs, and even passing the U.S. medical licensing exam.
Despite that suggestion that the chatbot could be a doctor, it’s still far from clear whether it provides users with trustworthy medical information.
The new study, published April 4 in the journal Radiology, tested the chatbot’s ability to answer some “fundamental” questions on breast cancer screening and prevention.
Overall, it found, the technology provided appropriate answers 88% of the time. Whether that would beat a Google search, or your doctor, is hard to say.
But the accuracy rate is “pretty impressive,” said senior researcher Dr. Paul Yi, an assistant professor of diagnostic radiology and nuclear medicine at the University of Maryland School of Medicine.
That said, Yi also pointed to the limits of ChatGPT as it stands. For one, he said, when the subject is health and medicine, even a 10% error rate could be harmful.
Beyond that, the appeal of ChatGPT — its ability to quickly assemble an array of data into a “chat” — is also its downside. Its responses to complex questions, Yi said, are limited in scope. So even when they are technically correct, they can give a slanted picture.
Yi’s team found that was true when they asked ChatGPT about breast cancer screening. The response offered the recommendations of the American Cancer Society only — omitting those of other medical groups, which in some cases differ.
And the average ChatGPT user, Yi said, may not know enough to ask follow-up questions, or know how to check whether the response is accurate at all.
Yi said he thinks the conversational nature of ChatGPT is an advantage of the technology, versus an old-fashioned internet search.
“The downside is, you can’t really verify if the information is accurate,” he said.
Of course, Yi noted, the accuracy of online information has always been an issue. The difference with ChatGPT is in how it’s presented. And the technology’s appeal — that conversational tone — can also be pretty “convincing,” Yi said.
“As with any new technology,” he said, “I think people should take it with a grain of salt.”
For the study, Yi’s team pulled together 25 questions that patients commonly ask about breast cancer prevention and screening, then presented them to ChatGPT. Each question was asked three times, to see whether and how the responses varied.
Overall, the chatbot gave appropriate answers to 22 questions, and unreliable responses to three. For one question — “Do I need to plan my mammogram around my COVID vaccination?” — it gave outdated information. For two others, the answers were inconsistent across the three tests.
One of those questions — “How do I prevent breast cancer?” — was broad and complex, with lots of information (factual and not) floating around the internet.
And that’s key, said Subodha Kumar, a professor of statistics, operations and data science at Temple University’s Fox School of Business in Philadelphia.
The more precise the question, he said, the more reliable the response will be. When the topic is complex, and data sources are plentiful and in some cases questionable, responses will be less trustworthy and likely more biased.
And the more complicated the topic, Kumar said, the more likely ChatGPT is to “hallucinate.” That’s a term used to describe the chatbot’s documented tendency to “make stuff up,” he noted.
Kumar, who was not involved in the new study, stressed that the answers ChatGPT doles out are only as good as the information it was, and continues to be, fed. “And there’s no guarantee it will only be fed accurate information,” he said.
As time goes on, the chatbot will gather more data, including from users, Kumar noted — so it’s possible the accuracy could worsen instead of improve.
“When the subject is health care, that can be dangerous,” he said.
Both researchers said they think ChatGPT and similar technologies hold great promise. To Kumar, the chatbot could, for instance, be a good “assistance device” to doctors looking to quickly get some information on a topic — but who would also have the knowledge to put a response in perspective.
“For the average consumer,” Kumar said, “I would advise against using this for health care information.”
The Pew Research Center has more on AI technology.
SOURCES: Paul H. Yi, MD, assistant professor, diagnostic radiology and nuclear medicine, University of Maryland School of Medicine, Baltimore; Subodha Kumar, PhD, MBA, professor, statistics, operations and data science, Temple University Fox School of Business, Philadelphia; Radiology, April 4, 2023, online
Copyright © 2023 HealthDay. All rights reserved.