Generated by GPT-5-mini| ROC | |
|---|---|
| Name | Receiver operating characteristic |
| Caption | Example of a receiver operating characteristic curve |
| Field | Signal detection theory, Statistics, Machine learning |
| Introduced | Early 20th century |
| Common uses | Diagnostic testing, Radar, Remote sensing, Medical imaging |
ROC
Receiver operating characteristic is a graphical and analytical tool used in Signal detection theory, Statistics, Machine learning, Diagnostic test evaluation, and Remote sensing to assess binary classifier performance. It summarizes trade-offs between true positive and false positive rates across decision thresholds, facilitating comparison among models like Logistic regression, Support vector machine, and Random forest. Widely applied in fields such as Medical imaging, Epidemiology, Radar systems, and Credit scoring, it underpins methods for threshold selection, model calibration, and decision analysis.
Origins trace to early 20th-century work in Radar signal detection during World War II and to psychophysics studies by researchers associated with Signal detection theory at institutions including Harvard University and Bell Labs. Postwar adoption by Meteorology and Remote sensing expanded use, while formal statistical treatments emerged through scholars connected to John von Neumann-era decision theory and later contributions from Bradley Efron-adjacent communities. The area under curve concept was popularized in diagnostic medicine through researchers at Johns Hopkins University and in machine learning via conferences like NeurIPS and ICML.
Key quantities include true positive rate (sensitivity), false positive rate (1 − specificity), and threshold-dependent classification rules as used in Logistic regression and Linear discriminant analysis. The area under the ROC curve (AUC) gives a scalar summary comparable to the Mann–Whitney U statistic and relates to concordance measures used in Cox proportional hazards model assessment. Concepts of likelihood ratio, decision threshold, and cost-weighted errors connect to work by investigators tied to Neyman–Pearson lemma and to applied frameworks like Receiver operating characteristic analysis in ClinicalTrials.gov-linked studies.
Constructing a curve typically involves scoring instances via models such as Naive Bayes, Gradient boosting machine, or Neural network classifiers, then plotting sensitivity versus false positive rate across score thresholds. Empirical curves derive from ranked lists and can be smoothed using techniques from Kernel density estimation or parametric fits like the binormal model used in Psychometrics. Interpretation often employs comparisons among curves using nonparametric tests related to DeLong test methodologies and uses partial-AUC for regions of practical interest, as applied in publications from American Medical Association journals and proceedings of IEEE conferences.
Beyond AUC, metrics include partial AUC, Youden's J statistic, and Net Reclassification Improvement (NRI) which have appeared in literature from European Society of Cardiology and American Heart Association guideline studies. Calibration measures such as Brier score, and discrimination indices like C-statistic in survival analysis, relate to ROC-based assessments in research from National Institutes of Health-funded groups. Comparisons to precision–recall curves involve contexts where class imbalance is discussed in papers presented at KDD and ICML.
In Medical imaging, ROC analysis evaluates diagnostic modalities like Magnetic resonance imaging, Computed tomography, and Mammography in multicenter trials coordinated by agencies such as National Cancer Institute. In Radar engineering, ROC underlies detection thresholds for systems developed by organizations including Raytheon and Lockheed Martin. Ecological remote sensing studies using MODIS or Landsat products employ ROC for land-cover classification accuracy assessments cited in work from NASA and USGS. Financial institutions use ROC-based AUC for credit-scoring validations in reports from Federal Reserve-linked research groups.
Limitations include insensitivity of AUC to clinically relevant ranges, potential misinterpretation under severe class imbalance, and lack of direct incorporation of cost or prevalence without extensions like decision curves endorsed in BMJ methodological papers. Extensions include cost-weighted ROC, covariate-adjusted ROC surfaces for ordinal outcomes, time-dependent ROC for censored survival outcomes popularized in studies from Johns Hopkins University and University of Pennsylvania, and multiclass generalizations using one-vs-rest schemes discussed at NeurIPS and in textbooks from Springer publishers.
Category:Statistical charts