From these applications, it is clear these models in their present form have real potential pitfalls. Simply predicting the next entry in a sequence of words is divorced from the “real” intelligence and critical appraisal of a medical decision-making process. The current state of the art is in a sort of so-called “uncanny valley,” in which the realism is near-human, but remains problematic enough to become disconcertingly unnatural. Likewise, the danger of confidently imperfect natural language output is immediately obvious, requiring vigilant error-checking and potentially negating some of the advantages in time saved. Only an expert-level clinician may be capable of identifying minor inaccuracies in clinical guidance, while identifying transcription and coding errors may prove practically impossible, given the content necessary for review for validation.
Explore This IssueACEP Now: Vol 42 – No 06 – June 2023
These concerns aside, however, it is worth noting the leap from GPT-3.5 to GPT-4 required only a few months of additional development, while adding a significant leap in performance. The teams developing and tuning these models are acutely aware of the issues and obstacles present in their models. Future versions are likely to have greater accuracy and error-checking abilities, as well as improved domain-specific generative abilities. Just a few months ago, these models were hardly part of the public consciousness, and these are just the first initial steps in determining their potential applications and the refinements necessary. Even if these models are not quite yet ready for use today, their future use to augment decision making and productivity is inevitable.
Dr. Radecki (@emlitofnote) is an emergency physician and informatician with Christchurch Hospital in Christchurch, New Zealand. He is the Annals of Emergency Medicine podcast co-host and Journal Club editor.
- Nori H, et al. Capabilities of GPT-4 on medical challenge problems. Cornell University arXiv website. doi: https://doi.org/10.48550/arXiv.2303.13375. Published March 20, 2023. Last revised April 12, 2023. Accessed May 15, 2023.
- Ali R, Tang O, et al. Performance of ChatGPT and GPT-4 on neurosurgery written board examinations. medRxiv website. https://doi.org/10.1101/2023.03.25.23287743. Published March 29, 2023. Accessed May 15, 2023.
- Haug CJ, Drazen JM. Artificial intelligence and machine learning in clinical medicine 2023. N Engl J Med. 2023;388(13):1201-1208.
- Gravel J, et al. Learning to fake it: Limited responses and fabricated references provided by ChatGPT for medical questions. medRxiv website. https://doi.org/10.1101/2023.03.16.23286914. Published March 24, 2023. Accessed May 15, 2023.