{"id":1152512,"date":"2025-10-17T10:07:26","date_gmt":"2025-10-17T17:07:26","guid":{"rendered":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/?post_type=msr-research-item&#038;p=1152512"},"modified":"2025-10-17T11:14:55","modified_gmt":"2025-10-17T18:14:55","slug":"leveraging-large-language-models-to-generate-multiple-choice-questions-for-ophthalmology-education","status":"publish","type":"msr-research-item","link":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/publication\/leveraging-large-language-models-to-generate-multiple-choice-questions-for-ophthalmology-education\/","title":{"rendered":"Leveraging Large Language Models to Generate Multiple-Choice Questions for Ophthalmology Education"},"content":{"rendered":"<div class=\"widget-ArticleFulltext widget-instance-AMA_Fulltext_Abstract\" data-widget-name=\"ArticleFulltext\" data-widget-instance=\"AMA_Fulltext_Abstract\">\n<div class=\"article-full-text\" data-userhasaccess=\"True\">\n<div id=\"AbstractSection\">\n<p><strong>Importance<\/strong>\u00a0\u00a0Multiple choice questions (MCQs) are an important and integral component of ophthalmology residency training evaluation and board certification; however, high-quality questions are difficult and time-consuming to draft.<\/p>\n<p><strong>Objective<\/strong>\u00a0\u00a0To evaluate whether general-domain large language models (LLMs), particularly OpenAI\u2019s Generative Pre-trained Transformer 4 (GPT-4), can reliably generate high-quality, novel, and readable MCQs comparable to those of a committee of experienced examination writers.<\/p>\n<p><strong>Design, Setting, and Participants<\/strong>\u00a0\u00a0This survey study, conducted from September 2024 to April 2025, assesses LLM performance in generating MCQs based on the American Academy of Ophthalmology (AAO)\u00a0<i>Basic and Clinical Science Course<\/i>\u00a0(<i>BCSC<\/i>) compared with a committee of human experts. Ten expert ophthalmologists, who were masked to the generation source, independently evaluated MCQs using a 10-point Likert scale (1\u2009=\u2009extremely poor; 10\u2009=\u2009criterion standard quality) across 5 criteria: appropriateness, clarity and specificity, relevance, discriminative power, and suitability for trainees.<\/p>\n<p><strong>Intervention<\/strong>\u00a0\u00a0Relevant\u00a0<i>BCSC<\/i>\u00a0content and AAO question-writing guidelines were input into GPT-4o via Microsoft\u2019s Azure OpenAI Service, and structured prompts were used to generate MCQs.<\/p>\n<p><strong>Main Outcomes and Measures<\/strong>\u00a0\u00a0The primary outcomes were median scores and statistical comparisons using the bootstrapping method; string similarity scores based on Levenshtein distance (0-100, with 100 indicating identical content) between LLM-MCQs and the entire\u00a0<i>BCSC<\/i>\u00a0question bank; Flesch Reading Ease metric for readability; and intraclass correlation coefficient (ICC) for inter-rater agreement are reported.<\/p>\n<p><strong>Results<\/strong>\u00a0\u00a0The 10 graders had between 1 and 28 years of clinical experience in ophthalmology (median [IQR] experience, 6 years [3-15 years]). Questions generated by GPT-4 and a committee of experts received median scores of 9 and 9 in combined scores, appropriateness, clarity and specificity, and relevance (difference, 0; 95% CI, 0-0;\u00a0<i>P<\/i>\u2009>\u2009.99); 8 and 9 in discriminative power (difference, 1; 95% CI, \u22121 to 1;\u00a0<i>P<\/i>\u2009=\u2009.52); and 8 and 8 in suitability for trainees (difference, 0; 95% CI, \u22121 to 0;\u00a0<i>P<\/i>\u2009>\u2009.99), respectively. Nearly 95% of LLM-MCQs had similarity scores less than 60, indicating most LLM-MCQs had limited or no resemblance to existing content. Interrater reliability was moderate (ICC, 0.63;\u00a0<i>P<\/i>\u2009<\u2009.001), and mean (SD) readability scores were similar across sources (37.14 [22.54] vs 42.60 [22.84];\u00a0<i>P<\/i>\u2009>\u2009.99).<\/p>\n<p><strong>Conclusions and Relevance<\/strong>\u00a0\u00a0In this survey study, results indicate that an LLM could be used to develop ophthalmology board\u2013style MCQs and expand examination banks to further support ophthalmology residency training. Despite most questions having a low similarity score, the quality, novelty, and readability of the LLM-generated questions need to be further assessed.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"widget-ArticleFulltext widget-instance-AMA_ArticleFulltext_New\" data-widget-name=\"ArticleFulltext\" data-widget-instance=\"AMA_ArticleFulltext_New\">\n<div class=\"article-full-text\" data-userhasaccess=\"True\">\n<p class=\"para\">\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Importance\u00a0\u00a0Multiple choice questions (MCQs) are an important and integral component of ophthalmology residency training evaluation and board certification; however, high-quality questions are difficult and time-consuming to draft. Objective\u00a0\u00a0To evaluate whether general-domain large language models (LLMs), particularly OpenAI\u2019s Generative Pre-trained Transformer 4 (GPT-4), can reliably generate high-quality, novel, and readable MCQs comparable to those of a [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_publishername":"","msr_publisher_other":"","msr_booktitle":"","msr_chapter":"","msr_edition":"","msr_editors":"","msr_how_published":"","msr_isbn":"","msr_issue":"","msr_journal":"JAMA Ophthalmology","msr_number":"","msr_organization":"","msr_pages_string":"","msr_page_range_start":"","msr_page_range_end":"","msr_series":"","msr_volume":"","msr_copyright":"","msr_conference_name":"","msr_doi":"","msr_arxiv_id":"","msr_s2_paper_id":"","msr_mag_id":"","msr_pubmed_id":"","msr_other_authors":"","msr_other_contributors":"","msr_speaker":"","msr_award":"","msr_affiliation":"","msr_institution":"","msr_host":"","msr_version":"","msr_duration":"","msr_original_fields_of_study":"","msr_release_tracker_id":"","msr_s2_match_type":"","msr_citation_count_updated":"","msr_published_date":"2025-10-1","msr_highlight_text":"","msr_notes":"","msr_longbiography":"","msr_publicationurl":"","msr_external_url":"","msr_secondary_video_url":"","msr_conference_url":"","msr_journal_url":"","msr_s2_pdf_url":"","msr_year":0,"msr_citation_count":0,"msr_influential_citations":0,"msr_reference_count":0,"msr_s2_match_confidence":0,"msr_microsoftintellectualproperty":true,"msr_s2_open_access":false,"msr_s2_author_ids":[],"msr_pub_ids":[],"msr_hide_image_in_river":null,"footnotes":""},"msr-research-highlight":[],"research-area":[13556,13553],"msr-publication-type":[193715],"msr-publisher":[],"msr-focus-area":[],"msr-locale":[268875],"msr-post-option":[269148,269142],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-1152512","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-artificial-intelligence","msr-research-area-medical-health-genomics","msr-locale-en_us","msr-post-option-approved-for-river","msr-post-option-include-in-river"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2025-10-1","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"JAMA Ophthalmology","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/jamanetwork.com\/journals\/jamaophthalmology\/fullarticle\/2839639","label_id":"243109","label":0},{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/doi.org\/10.1001\/jamaophthalmol.2025.3622","label_id":"243106","label":0}],"msr_related_uploader":"","msr_citation_count":0,"msr_citation_count_updated":"","msr_s2_paper_id":"","msr_influential_citations":0,"msr_reference_count":0,"msr_arxiv_id":"","msr_s2_author_ids":[],"msr_s2_open_access":false,"msr_s2_pdf_url":null,"msr_attachments":[],"msr-author-ordering":[{"type":"user_nicename","value":"Shahrzad Gholami","user_id":39757,"rest_url":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Shahrzad Gholami"},{"type":"text","value":"Daniel B. Mummert","user_id":0,"rest_url":false},{"type":"text","value":"Beth Wilson","user_id":0,"rest_url":false},{"type":"text","value":"Sarah Page","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Rahul Dodhia","user_id":41401,"rest_url":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Rahul Dodhia"},{"type":"user_nicename","value":"Juan M. Lavista Ferres","user_id":39552,"rest_url":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Juan M. Lavista Ferres"},{"type":"user_nicename","value":"Bill Weeks","user_id":39582,"rest_url":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Bill Weeks"},{"type":"text","value":"Dale E. Fajardo","user_id":0,"rest_url":false},{"type":"text","value":"Karine D. Bojikian","user_id":0,"rest_url":false}],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[696544],"msr_project":[778522],"publication":[],"video":[],"msr-tool":[],"msr_publication_type":"article","related_content":{"projects":[{"ID":778522,"post_title":"AI for Health","post_name":"ai-for-health","post_type":"msr-project","post_date":"2023-05-16 14:26:13","post_modified":"2024-10-14 15:42:21","post_status":"publish","permalink":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/project\/ai-for-health\/","post_excerpt":"AI for Health is a philanthropic program launched by Microsoft, which aims to support nonprofits, researchers, and organizations working on global health challenges. The program provides access to artificial intelligence (AI) technology and expertise in three main areas: population health, imaging analytics, genomics &amp; proteomics.","_links":{"self":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/778522"}]}}]},"_links":{"self":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/1152512","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":2,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/1152512\/revisions"}],"predecessor-version":[{"id":1152514,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/1152512\/revisions\/1152514"}],"wp:attachment":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1152512"}],"wp:term":[{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=1152512"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1152512"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=1152512"},{"taxonomy":"msr-publisher","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-publisher?post=1152512"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=1152512"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1152512"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=1152512"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=1152512"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=1152512"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=1152512"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=1152512"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=1152512"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}