Jump to content
  • Sign Up
×
×
  • Create New...

GPT-4 performed close to the level of expert doctors in eye assessments


Recommended Posts

  • Diamond Member



GPT-4 performed close to the level of expert doctors in eye assessments

As learning language models (LLMs) continue to advance, so do questions about how they can benefit society in areas such as the medical field. A

This is the hidden content, please
from the University of Cambridge’s School of Clinical Medicine found that OpenAI’s GPT-4 performed nearly as well in an ophthalmology assessment as experts in the field,
This is the hidden content, please

In the study, published in PLOS Digital Health, researchers tested the LLM, its predecessor GPT-3.5,

This is the hidden content, please
’s PaLM 2 and Meta’s LLaMA with 87 multiple choice questions. Five expert ophthalmologists, three ******** ophthalmologists and two unspecialized junior doctors received the same mock exam. The questions came from a textbook for trialing trainees on everything from light sensitivity to lesions. The contents aren’t publicly available, so the researchers believe LLMs couldn’t have been trained on them previously. ChatGPT, equipped with GPT-4 or GPT-3.5, was given three chances to answer definitively or its response was marked as null.

GPT-4 scored higher than the trainees and junior doctors, getting 60 of the 87 questions right. While this was significantly higher than the junior doctors’ average of 37 correct answers, it just beat out the three trainees’ average of 59.7. While one expert ophthalmologist only answered 56 questions accurately, the five had an average score of 66.4 right answers, beating the machine. PaLM 2 scored a 49, and GPT-3.5 scored a 42. LLaMa scored the lowest at 28, falling below the junior doctors. Notably, these trials occurred in mid-2023.

While these results have potential benefits, there are also quite a few risks and concerns. Researchers noted that the study offered a limited number of questions, especially in certain categories, meaning the actual results might be varied. LLMs also have a tendency to “hallucinate” or make things up. That’s one thing if its an irrelevant fact but claiming there’s a cataract or ******* is another story. As is the case in many instances of LLM use, the systems also lack nuance, creating further opportunities for inaccuracy.





This is the hidden content, please

chatgpt, news, gear, LLaMA, doctor, gpt-4, gpt-3.5, PaLM 2, AI, Research, ai
#GPT4 #performed #close #level #expert #doctors #eye #assessments

This is the hidden content, please

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Vote for the server

    To vote for this server you must login.

    Jim Carrey Flirting GIF

  • Recently Browsing   0 members

    • No registered users viewing this page.

Important Information

Privacy Notice: We utilize cookies to optimize your browsing experience and analyze website traffic. By consenting, you acknowledge and agree to our Cookie Policy, ensuring your privacy preferences are respected.