How Much is Enough: A Large-Scale Study of Formative Assessment
Overview
This study tackles a practical question that every educator faces: how many quiz questions throughout a course does it take to predict how a student will perform on the final exam? These ongoing quiz questions - known as formative assessments - are assessment for learning, helping both students and teachers gauge understanding and adjust their approach while the course is still in progress. This contrasts with the final exam (summative assessment), which is assessment of learning, measuring what students have achieved at the end. Analyzing over 20,000 student enrollments across 127 course runs in biomedical sciences, we discovered that after students complete just 40% of the quiz questions in a course, their scores already show a strong correlation (Pearson r > 0.7) with their final exam performance. Our work has immediate practical significance for course design and early intervention systems.
Technical Notes
We analyzed 831,065 assessment attempts from 16,707 course completers across 15 different biomedical science courses delivered via HMX, an online learning platform. Our analysis focused on the correlation between intermediate formative assessment scores and final exam performance. The key finding emerged clearly: correlation coefficients reached 95% of their maximum value after students completed just 40-60% of formative assessments, with most courses clustering around the 40% mark. This percentage-based threshold held true across both longer Fundamentals courses (172-293 questions) and shorter Pro courses (101-160 questions). When we randomized question order in a subset of eight Immunology courses, the predictive threshold actually improved, reaching significance at just 30% of assessments. Statistical validation using Fisher’s z-transformation confirmed that first-attempt scores significantly outperformed all-attempts scores as predictors at every decile (p < 0.05).
My Reflections on This Work
While building HMX, I believed in the predictive potential of formative assessments. I envisioned a relationship of steadily increasing correlation between student quiz performance and final exam scores as more assessments accumulated throughout a course. Validating this intuition in a robust way required significant scale. Years later, analyzing nearly a million assessment attempts and seeing that exact pattern emerge was gratifying, as was the thought that this research could help guide course design decisions and development of personalized learning.