Receiver operating characteristic

Receiver operating characteristic
Name	Receiver operating characteristic
Caption	ROC curve illustration
Field	Statistics; Signal detection theory
Invented	1940s
Inventor	Radar analysis; Psychophysics
Key figures	Alfred Hitchcock, David Brillinger, John Tukey, Bradley Efron, Jerzy Neyman, Egon Pearson, Alan Turing, Norbert Wiener, Claude Shannon, W. Edwards Deming, George Box, Karl Pearson, Ronald A. Fisher, Harold Hotelling, Abraham Wald, Thomas Bayes, John von Neumann, Andrey Kolmogorov, I. J. Good, Roger Penrose, Peter J. Bickel, Leo Breiman, Adrian Banner, Andrew Gelman, Bradford Hill, Donald Rubin, Jerome Friedman, Trevor Hastie, Robert Tibshirani, Vladimir Vapnik, Yuri Nesterov, Michael I. Jordan, Geoffrey Hinton, Yann LeCun, Fei-Fei Li, Judea Pearl, Imre Lakatos, Paul Samuelson, John Maynard Keynes, Alan Greenspan, Milton Friedman, Joseph Banks Rhine, Edward O. Wilson, Noam Chomsky, Marshall McLuhan, Hans Rosling, Tim Berners-Lee, Marvin Minsky, Stuart Russell, Peter Norvig, Elon Musk, Jeff Bezos, Bill Gates, Steve Jobs, Sundar Pichai, Satya Nadella, Mark Zuckerberg, Larry Page, Sergey Brin, Ada Lovelace, Grace Hopper, Katherine Johnson, Srinivasa Ramanujan, Alan Turing Award, Nobel Prize in Physics, Turing test, DARPA, NASA, NIH, FDA, WHO, CDC, IMF, World Bank, European Commission, United Nations, OECD, Harvard University, Stanford University, Massachusetts Institute of Technology, University of Cambridge, University of Oxford, Princeton University, University of California, Berkeley, Columbia University, Yale University, University of Chicago, California Institute of Technology, ETH Zurich, Max Planck Society, CNRS, CERN, Bell Labs, AT&T, IBM, Microsoft Research, Google Research, Facebook AI Research, OpenAI, DeepMind, Apple Inc.

Contents

Introduction
Definitions and Basic Concepts
ROC Curve Construction and Interpretation
Summary Measures (AUC, Youden's J, etc.)
Statistical Properties and Inference
Extensions and Variants (Precision-Recall, Partial AUC, Multiclass)
Applications and Practical Considerations

Receiver operating characteristic The receiver operating characteristic (ROC) is a graphical tool and analytical framework originating in World War II radar analysis and psychophysics used to evaluate binary classifiers and diagnostic tests. It displays trade-offs between true positive rate and false positive rate across decision thresholds, enabling performance comparison across models, tests, and instruments from fields as diverse as NASA remote sensing, FDA medical device approval, WHO epidemiology, IBM machine learning, and DARPA signal processing. ROC methods connect to foundational work by Jerzy Neyman, Egon Pearson, Alan Turing, and Claude Shannon and are central to modern analytics developed at institutions like Stanford University, Massachusetts Institute of Technology, and Google Research.

Introduction

ROC emerged from early United Kingdom and United States radar research in the 1940s and was formalized in psychology and signal detection theory studies by researchers influenced by Harvard University and Yale University laboratories. It was later adopted in biostatistics at Johns Hopkins University and integrated into machine learning practice at University of California, Berkeley and Carnegie Mellon University. Major textbooks from authors affiliated with Princeton University and Stanford University popularized ROC in the context of classification, diagnostics, and information theory.

Definitions and Basic Concepts

Key definitions include true positive rate (sensitivity), false positive rate (1 − specificity), positive predictive value, and negative predictive value; these are estimated from contingency tables often used in clinical trials overseen by FDA or NIH. Related concepts draw on likelihood ratios and hypothesis testing traditions from Jerzy Neyman and Egon Pearson and on information measures from Claude Shannon. ROC analysis assumes a scoring classifier or continuous measurement and compares distributions of scores for positive and negative classes, a perspective developed in statistical signal processing at Bell Labs and AT&T.

ROC Curve Construction and Interpretation

Constructing an ROC curve requires ranking cases by a score and computing sensitivity and false positive rate at each threshold, a procedure common in bioinformatics pipelines at European Bioinformatics Institute and Broad Institute. Graphical interpretation includes convexity, monotonicity, and comparisons to a random classifier line (diagonal) rooted in probabilistic ideas propagated at University of Cambridge and University of Oxford. Operational operating points are chosen using costs or prevalence considerations relevant to policy bodies like the World Health Organization or regulatory agencies like the FDA.

Summary Measures (AUC, Youden's J, etc.)

Area under the ROC curve (AUC) summarizes overall discriminative ability and connects to the Mann–Whitney U statistic developed by scholars associated with University of Pennsylvania and Columbia University. Youden's J index is used to choose thresholds in clinical chemistry and public health screening programs endorsed by CDC and WHO. Other summaries include partial AUC, Gini coefficient adaptations used in credit scoring at International Monetary Fund-linked institutions, and concordance statistics used in epidemiology research at Johns Hopkins University.

Statistical Properties and Inference

AUC estimators have sampling distributions that permit confidence intervals and hypothesis tests; resampling methods such as bootstrap and cross-validation—popularized by researchers at Stanford University and University of California, Berkeley—are commonly applied. DeLong's test and ROC regression frameworks trace to work from statistical departments at Columbia University and Harvard University; Bayesian ROC modeling has been advanced by groups at University of Oxford and University College London.

Extensions and Variants (Precision-Recall, Partial AUC, Multiclass)

Precision–recall curves are preferred under class imbalance, a practice used in computer vision projects at Facebook AI Research and DeepMind. Partial AUC focuses on clinically relevant ranges and is applied in evaluations performed at NIH and FDA centers. Multiclass ROC generalizations, such as one-vs-rest and volume under surface (VUS), are implemented in large-scale systems at Google Research, Microsoft Research, and OpenAI.

Applications and Practical Considerations

ROC methods are widely used in medical diagnosis for imaging modalities developed at Mayo Clinic and Massachusetts General Hospital, in remote sensing for NASA earth observation projects, in credit risk modeling in banking institutions like Goldman Sachs and JPMorgan Chase, and in information retrieval systems pioneered at Yahoo! and Microsoft Research. Practical considerations include sample size, prevalence, decision costs, calibration of predicted probabilities, and fairness audits conducted by teams at Harvard University and MIT Media Lab. Software implementations are available from communities around R (programming language), Python (programming language), SciPy, scikit-learn, and TensorFlow used in research at University of Toronto and Carnegie Mellon University.

Category:Statistical charts