Generated by GPT-5-miniReceiver operating characteristic The receiver operating characteristic (ROC) is a graphical tool and analytical framework originating in World War II radar analysis and psychophysics used to evaluate binary classifiers and diagnostic tests. It displays trade-offs between true positive rate and false positive rate across decision thresholds, enabling performance comparison across models, tests, and instruments from fields as diverse as NASA remote sensing, FDA medical device approval, WHO epidemiology, IBM machine learning, and DARPA signal processing. ROC methods connect to foundational work by Jerzy Neyman, Egon Pearson, Alan Turing, and Claude Shannon and are central to modern analytics developed at institutions like Stanford University, Massachusetts Institute of Technology, and Google Research.
ROC emerged from early United Kingdom and United States radar research in the 1940s and was formalized in psychology and signal detection theory studies by researchers influenced by Harvard University and Yale University laboratories. It was later adopted in biostatistics at Johns Hopkins University and integrated into machine learning practice at University of California, Berkeley and Carnegie Mellon University. Major textbooks from authors affiliated with Princeton University and Stanford University popularized ROC in the context of classification, diagnostics, and information theory.
Key definitions include true positive rate (sensitivity), false positive rate (1 − specificity), positive predictive value, and negative predictive value; these are estimated from contingency tables often used in clinical trials overseen by FDA or NIH. Related concepts draw on likelihood ratios and hypothesis testing traditions from Jerzy Neyman and Egon Pearson and on information measures from Claude Shannon. ROC analysis assumes a scoring classifier or continuous measurement and compares distributions of scores for positive and negative classes, a perspective developed in statistical signal processing at Bell Labs and AT&T.
Constructing an ROC curve requires ranking cases by a score and computing sensitivity and false positive rate at each threshold, a procedure common in bioinformatics pipelines at European Bioinformatics Institute and Broad Institute. Graphical interpretation includes convexity, monotonicity, and comparisons to a random classifier line (diagonal) rooted in probabilistic ideas propagated at University of Cambridge and University of Oxford. Operational operating points are chosen using costs or prevalence considerations relevant to policy bodies like the World Health Organization or regulatory agencies like the FDA.
Area under the ROC curve (AUC) summarizes overall discriminative ability and connects to the Mann–Whitney U statistic developed by scholars associated with University of Pennsylvania and Columbia University. Youden's J index is used to choose thresholds in clinical chemistry and public health screening programs endorsed by CDC and WHO. Other summaries include partial AUC, Gini coefficient adaptations used in credit scoring at International Monetary Fund-linked institutions, and concordance statistics used in epidemiology research at Johns Hopkins University.
AUC estimators have sampling distributions that permit confidence intervals and hypothesis tests; resampling methods such as bootstrap and cross-validation—popularized by researchers at Stanford University and University of California, Berkeley—are commonly applied. DeLong's test and ROC regression frameworks trace to work from statistical departments at Columbia University and Harvard University; Bayesian ROC modeling has been advanced by groups at University of Oxford and University College London.
Precision–recall curves are preferred under class imbalance, a practice used in computer vision projects at Facebook AI Research and DeepMind. Partial AUC focuses on clinically relevant ranges and is applied in evaluations performed at NIH and FDA centers. Multiclass ROC generalizations, such as one-vs-rest and volume under surface (VUS), are implemented in large-scale systems at Google Research, Microsoft Research, and OpenAI.
ROC methods are widely used in medical diagnosis for imaging modalities developed at Mayo Clinic and Massachusetts General Hospital, in remote sensing for NASA earth observation projects, in credit risk modeling in banking institutions like Goldman Sachs and JPMorgan Chase, and in information retrieval systems pioneered at Yahoo! and Microsoft Research. Practical considerations include sample size, prevalence, decision costs, calibration of predicted probabilities, and fairness audits conducted by teams at Harvard University and MIT Media Lab. Software implementations are available from communities around R (programming language), Python (programming language), SciPy, scikit-learn, and TensorFlow used in research at University of Toronto and Carnegie Mellon University.
Category:Statistical charts