Assessing ChatGPT's performance in national nuclear medicine specialty examination: An evaluative analysis

Document Type : Original Article

Authors

1 Department of Biophysics, Faculty of Medical Sciences, Medical University of Silesia, Zabrze, Poland

2 Professor Zbigniew Religa Student Scientific Association, Department of Biophysic, Faculty of Medical Sciences, Medical University of Silesia, Zabrze, Poland

3 Wielospecjalistyczny Szpital Powiatowy S.A. im. dr B. Hagera Pyskowicka 47-51,42-612, Tarnowskie Góry, Poland

4 Department of Medical and Molecular Biology, Faculty of Medical Sciences, Medical University of Silesia, Zabrze, Poland

Abstract

Introduction: The rapid development of artificial intelligence (AI) has sparked a desire to analyse its potential applications in medicine. The aim of this article is to present the effectiveness of the ChatGPT advanced language model in the context of the pass rate of the polish National Specialty Examination (PES) in nuclear medicine. It also aims to identify its strengths and limitations through an in-depth analysis of the issues raised in the exam questions.
Methods: The PES exam provided by the Centre for Medical Examinations in Łódź, consisting of 120 questions, was used for the study. The questions were asked using the openai.com platform, through which free access to the GPT-3.5 model is available. All questions were classified according to Bloom's taxonomy to determine their complexity and difficulty, and according to two authors' subcategories. To assess the model's confidence in the validity of the answers, each questions was asked five times in independent sessions.
Results: ChatGPT achieved 56%, which means it did not pass the exam. The pass rate is 60%. Of the 117 questions asked, 66 were answered correctly. In the percentage of each type and subtype of questions answered correctly, there were no statistically significant differences.
Conclusion: Further testing is needed using the questions provided by Centre for Medical Examinations from the nuclear medicine specialty exam to evaluate the utility of the ChatGPT model. This opens the door for further research on upcoming improved versions of the ChatGPT.

Keywords

Main Subjects


  1. The Lancet Digital Health. ChatGPT: friend or foe? Lancet Digit Health. 2023 Mar;5(3):e102.
  2. Levin G, Horesh N, Brezinov Y, Meyer R. Performance of ChatGPT in medical examinations: A systematic review and a meta-analysis. BJOG-INT J OBSTET GY.2023 Aug [cited 2023 Nov 25]. Available from: https://onlinelibrary.wiley.com/doi/abs/10.1111/1471-0528.17641
  3. Currie G, Barry K. ChatGPT in nuclear medicine education. J Nucl Med Technol. 2023 Sep;51(3):247-54.
  4. Foreland M. Bloom’s Taxonomy. In: Orey M, editor. Emerging perspectives on learning, teaching and technology. North Charleston: CreateSpace; 2010.
  5. Oztermeli AD, Oztermeli A. ChatGPT performance in the medical specialty exam: An observational study. Medicine (Baltimore). 2023 Aug 11;102(32):e34673.