CART (classification and regression tree)

CART (classification and regression tree)
Name	CART (classification and regression tree)
Caption	Decision tree schematic
Developer	Leo Breiman, Jerome H. Friedman, Richard A. Olshen, Charles J. Stone
Initial release	1984
Influenced by	ID3, C4.5
Programming languages	Fortran, S Language, R (programming language)
Genre	Machine learning, Statistics

Contents

Introduction
Algorithm and methodology
Splitting criteria and impurity measures
Pruning and complexity control
Extensions and variations
Applications and examples

CART (classification and regression tree) is a nonparametric decision tree learning technique for predicting responses by recursively partitioning feature space into homogeneous subsets. Developed and popularized by Leo Breiman, Jerome H. Friedman, Richard A. Olshen, and Charles J. Stone in the 1980s, CART produces binary trees used for both categorical classification and continuous regression tasks. Its algorithmic foundations influenced ensemble methods and software implementations in environments such as R (programming language), Python (programming language), and the S Language ecosystem.

Introduction

CART emerged from collaborations among statisticians at institutions including University of California, Berkeley, Stanford University, and University of California, Los Angeles and was presented in the monograph "Classification and Regression Trees." The method belongs to the broader lineage of decision tree learners that includes ID3, C4.5, and later ensemble techniques like bagging and Random forest. CART’s influence extends to applied fields represented by organizations such as National Institutes of Health, NASA, and World Bank where decision trees support interpretability in domains ranging from clinical trials to remote sensing.

Algorithm and methodology

CART constructs binary trees by selecting a predictor and a threshold at each node to split the dataset into two child nodes, continuing recursively until stopping criteria are met. The procedure is implemented in software packages used at institutions like Los Alamos National Laboratory, IBM, and Microsoft Research and follows a top-down, greedy strategy similar in spirit to techniques developed by researchers at Bell Labs and AT&T. Training uses impurity measures (see below) and handles missing data with surrogate splits, a technique explored in collaborations involving Cleveland Clinic and academic groups at Harvard University and Princeton University. CART models are often used as base learners in ensemble frameworks developed by teams at University of Toronto and Carnegie Mellon University.

Splitting criteria and impurity measures

For classification, CART commonly optimizes the Gini impurity to choose splits that maximize class purity; contemporaneous work at Columbia University and Yale University compared Gini to information gain used in ID3. For regression, CART minimizes squared error (variance reduction) at each split, a principle shared with least-squares estimators from University of Chicago econometrics groups. Theoretical analyses and comparisons have been advanced by researchers at Massachusetts Institute of Technology, Stanford University, and University of California, Berkeley linking impurity measures to bias–variance tradeoffs examined in symposia sponsored by American Statistical Association and Institute of Mathematical Statistics.

Pruning and complexity control

To prevent overfitting, CART employs cost-complexity pruning (also called weakest-link pruning) using a parameter alpha to balance tree size against misclassification or squared-error cost; academic evaluations have been reported in venues such as NeurIPS, ICML, and KDD. Cross-validation routines used for selecting pruning parameters are implemented in statistical software from R (programming language), SAS, and SPSS, and have been applied in large-scale studies at CERN and European Space Agency. The pruning framework relates to regularization concepts developed in research by groups at Columbia University and influenced subsequent model-selection work at Princeton University and ETH Zurich.

Extensions and variations

CART served as a foundation for numerous extensions: ensemble methods such as Random forest and Gradient boosting combine many CART-like trees; boosting algorithms were advanced by researchers at University of Toronto and companies like Google and Microsoft. Oblique decision trees, invented in research labs at MIT and California Institute of Technology, allow linear combinations of features at splits, while model trees and conditional inference trees developed at Max Planck Institute and University of Hohenheim integrate statistical tests into splitting. CART variants have been adapted for survival analysis in clinical research at Johns Hopkins University and Mayo Clinic, for high-dimensional genomics at Broad Institute, and for spatial data used by European Commission environmental programs.

Applications and examples

CART has been applied across disciplines: in medicine for prognostic modeling at Mayo Clinic and Memorial Sloan Kettering Cancer Center, in ecology for species distribution studies conducted by Smithsonian Institution researchers, in finance for credit scoring in banks such as JPMorgan Chase and Bank of America, and in remote sensing projects at NASA and European Space Agency. Practical examples include diagnostic decision support systems developed at Cleveland Clinic, customer segmentation deployed by Amazon (company) and Walmart, and fraud detection systems explored by teams at PayPal and Mastercard. CART’s interpretability makes it a standard pedagogical example in courses at Massachusetts Institute of Technology, Stanford University, University of Oxford, and University of Cambridge.

Category:Machine learning