Angoff method — LLMpedia

Angoff method
Name	Angoff method
Type	Standard-setting procedure
Developed	1971
Developer	William H. Angoff
Used for	Criterion-referenced tests, licensure, certification

Contents

Overview
Historical development
Procedure and variations
Statistical interpretation and reliability
Applications and criticisms
Implementation best practices

Angoff method The Angoff method is a widely used standard-setting procedure for determining cut scores on assessments. It combines expert judgment with psychometric data to establish minimum competency thresholds for professional licensure, certification, and high-stakes examinations. Practitioners often integrate Angoff-derived judgments with procedures from organizations such as the Educational Testing Service, American Educational Research Association, and National Council of Measurement in Education.

Overview

The Angoff approach solicits item-level judgments from subject-matter experts to estimate the probability that a minimally competent candidate would answer each item correctly. Panels typically include representatives from institutions like Harvard University, Stanford University, University of California, Berkeley, Columbia University, and University of Oxford to ensure diverse perspectives. Results are aggregated and combined with test-level statistics from vendors such as Pearson PLC and Prometric to produce a recommended cut score. The procedure interfaces with standards and guidelines promulgated by bodies like the World Health Organization, American Medical Association, and National Board of Medical Examiners.

Historical development

The method was formalized by William H. Angoff and gained prominence through adoption by testing organizations including Educational Testing Service and National Board of Medical Examiners. Early applications occurred alongside developments in psychometrics at institutions such as Princeton University and University of Chicago and were informed by practices from regulatory agencies like the U.S. Food and Drug Administration and General Medical Council. Its diffusion into professional credentialing echoed adoption by American Bar Association accreditation processes, Association of American Medical Colleges initiatives, and certification programs run by American Nurses Association.

Procedure and variations

Core Angoff procedure steps involve convening a panel of experts, defining the minimally competent candidate, rating each item, aggregating item probabilities, and setting the cut score. Panels often include faculty from Yale University, Johns Hopkins University, and Massachusetts Institute of Technology and stakeholders from agencies such as Department of Health and Human Services or Department of Education (United States). Variations include the Modified Angoff, the Bookmark method, and the Contrasting Groups method; users sometimes combine Angoff with empirical techniques endorsed by International Test Commission and Organisation for Economic Co-operation and Development. Iterative rounds use feedback, item performance data from administrations at sites like Mayo Clinic or Cleveland Clinic, and statistical adjustment by psychometricians from firms like ACT, Inc..

Statistical interpretation and reliability

Statistical interpretation treats aggregated item probabilities as estimates of expected score for the minimally competent candidate; the cut score equals the sum of those expectations. Reliability concerns prompt calculation of standard errors and variance components, drawing on models from Lord's paradox research and generalizability theory developed at University of Minnesota. Inter-rater agreement metrics reference work from Cochran–Mantel–Haenszel and kappa statistics influenced by studies at Columbia University Mailman School of Public Health. Simulation studies from Carnegie Mellon University and University College London examine bias, sampling variability, and impact of panel composition, while applied research at Imperial College London and University of Toronto explores consequences for pass/fail rates.

Applications and criticisms

Applications span professional licensure (e.g., programs administered by National Council of Architectural Registration Boards and American Institute of Certified Public Accountants), certification examinations for Project Management Institute credentials, and high-stakes tests used by institutions such as College Board and Graduate Record Examinations Board. Critics point to subjectivity concerns, panelist selection bias, and potential legal challenges referenced in jurisprudence from courts such as the Supreme Court of the United States. Empirical critiques appear in journals associated with American Psychological Association and reports produced by National Academy of Sciences, prompting calls for triangulation with standard-setting alternatives developed at RAND Corporation and Brookings Institution.

Implementation best practices

Best practices recommend clear definitions of the minimally competent candidate, representative panels including practitioners from American Medical Association-affiliated hospitals and academic centers like Stanford Health Care, training sessions informed by materials from Educational Testing Service, and multiple rating rounds with anonymized item statistics. Psychometric oversight from organizations such as Association of Test Publishers or consulting firms like Korn Ferry helps ensure defensibility, and documentation consistent with guidelines from International Labour Organization and European Commission strengthens transparency. Panels should use reliability analyses and simulation checks from research at Northwestern University and University of Michigan to refine decisions.

Category:Psychometrics