Abstract
Modern education demands meticulous design of teaching and learning materials to foster effective learning. Multiple-choice questions (MCQs) are widely used in education worldwide due to their high scalability, automated grading efficiency, and the ability to evaluate conceptual understanding across various topics. However, creating high-quality MCQs that match learning objectives accurately, balance different question types, Bloom’s revised taxonomy levels, difficulty levels, and avoid repetition of content is demanding and time-consuming even for experienced educators. Large Language Models (LLMs) with deep reasoning capabilities offer novel opportunities for automated MCQ generation. Creating high-quality Multiple-Choice Questions (MCQs) is a time-consuming challenge for educators. This paper introduces a sophisticated pipeline that leverages Large Language Models (LLMs) to automatically generate high-quality MCQs for Artificial Intelligence education. Our method employs a zero-shot prompting strategy with different question types to guide LLMs in creating questions. To ensure reliability and mitigate inaccuracies, the pipeline integrates a novel Chain-of-Verification (CoVe) methodology called Cross-CoVe to systematically validate the generated content. The generated MCQs of our pipeline are rigorously evaluated on MCQ quality, explanation quality, and degree of diversity in a case study in an Artificial Intelligence (AI) course. We generated 200 MCQs, and the results demonstrate that 94.5% of MCQs had a single, correctly identified answer, and 79% provided robust explanations for both the correct answer and the distractors. The pipeline also achieved excellent diversity, with 99% of questions being non-duplicates. Notably, our Cross-CoVe verification strategy proved highly effective, correctly identifying 63.6% of flawed questions—a statistically significant improvement (p = 0.002) that more than doubles the performance of a self-verification baseline (27.3%). Our work contributes to the intense research and discussion on AI-driven educational tools, highlighting the high potential of state-of-the-art LLMs with deep reasoning capabilities to assist rather than replace educators, especially in large-scale educational resource development.
| Original language | English |
|---|---|
| Pages (from-to) | 184332-184347 |
| Number of pages | 16 |
| Journal | IEEE Access |
| Volume | 13 |
| DOIs | |
| Publication status | Published - 2025 |
Keywords
- Bloom’s revised taxonomy
- Large language models (LLMs)
- chain-of-verification (CoVe)
- multiple-choice questions (MCQs)
Fingerprint
Dive into the research topics of 'Automatic Generation of High-Quality MCQs With LLMs for Artificial Intelligence Education'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver