Automatic Generation of High-Quality MCQs With LLMs for Artificial Intelligence Education

Research output: Contribution to journalArticlepeer-review

Abstract

Modern education demands meticulous design of teaching and learning materials to foster effective learning. Multiple-choice questions (MCQs) are widely used in education worldwide due to their high scalability, automated grading efficiency, and the ability to evaluate conceptual understanding across various topics. However, creating high-quality MCQs that match learning objectives accurately, balance different question types, Bloom’s revised taxonomy levels, difficulty levels, and avoid repetition of content is demanding and time-consuming even for experienced educators. Large Language Models (LLMs) with deep reasoning capabilities offer novel opportunities for automated MCQ generation. Creating high-quality Multiple-Choice Questions (MCQs) is a time-consuming challenge for educators. This paper introduces a sophisticated pipeline that leverages Large Language Models (LLMs) to automatically generate high-quality MCQs for Artificial Intelligence education. Our method employs a zero-shot prompting strategy with different question types to guide LLMs in creating questions. To ensure reliability and mitigate inaccuracies, the pipeline integrates a novel Chain-of-Verification (CoVe) methodology called Cross-CoVe to systematically validate the generated content. The generated MCQs of our pipeline are rigorously evaluated on MCQ quality, explanation quality, and degree of diversity in a case study in an Artificial Intelligence (AI) course. We generated 200 MCQs, and the results demonstrate that 94.5% of MCQs had a single, correctly identified answer, and 79% provided robust explanations for both the correct answer and the distractors. The pipeline also achieved excellent diversity, with 99% of questions being non-duplicates. Notably, our Cross-CoVe verification strategy proved highly effective, correctly identifying 63.6% of flawed questions—a statistically significant improvement (p = 0.002) that more than doubles the performance of a self-verification baseline (27.3%). Our work contributes to the intense research and discussion on AI-driven educational tools, highlighting the high potential of state-of-the-art LLMs with deep reasoning capabilities to assist rather than replace educators, especially in large-scale educational resource development.

Original languageEnglish
Pages (from-to)184332-184347
Number of pages16
JournalIEEE Access
Volume13
DOIs
Publication statusPublished - 2025

Keywords

  • Bloom’s revised taxonomy
  • Large language models (LLMs)
  • chain-of-verification (CoVe)
  • multiple-choice questions (MCQs)

Fingerprint

Dive into the research topics of 'Automatic Generation of High-Quality MCQs With LLMs for Artificial Intelligence Education'. Together they form a unique fingerprint.

Cite this