multidimensional scaling

multidimensional scaling
Name	Multidimensional scaling
Type	Statistical technique
Introduced	1930s
Developer	Karl Pearson, Torgerson, Sammon, Young–Householder
Application	psychometrics, market research, bioinformatics, geography

Contents

Overview
Mathematical formulation
Algorithms and variants
Applications
Evaluation and goodness-of-fit
Practical considerations and implementation

multidimensional scaling is a set of statistical techniques for visualizing the level of similarity of individual cases of a dataset by representing objects as points in a low-dimensional space. It aims to place each object in Euclidean space so that the between-object distances match given dissimilarities as closely as possible. Originating in early 20th-century work by Karl Pearson and later formalized by researchers associated with Torgerson and Young–Householder, the methods have been extended by contributors such as Sammon and later by researchers working at institutions like Bell Labs and RAND Corporation.

Overview

Multidimensional scaling is used to convert a matrix of pairwise dissimilarities into a spatial configuration; practitioners in psychometrics, sociology, marketing research, genomics, and geography use it to reveal latent structure. Classical MDS, also called Torgerson-Gower scaling, is related to principal component analysis developed by Karl Pearson and Harold Hotelling, while nonmetric variants trace conceptual ties to work at Bell Labs and methodological advances associated with John Tukey and Frank Harrell. MDS competes and complements methods from Isaac Newton-era geometry applied in cartography and modern algorithms from institutions like MIT and Stanford University.

Mathematical formulation

Given a symmetric dissimilarity matrix D for n objects, MDS seeks an embedding X in R^p minimizing a loss function comparing Euclidean distances in X to entries of D. Classical MDS can be derived from double-centering the matrix of squared dissimilarities and performing an eigen-decomposition linked to linear algebra advances by Alston Householder and John von Neumann. Metric MDS often minimizes a stress function introduced in variants by Sammon and later formalized by Kruskal; nonmetric MDS replaces observed dissimilarities with a monotone transformation related to ordinal regression ideas advanced by Sir Ronald Fisher and methods used at Bell Labs. Constrained forms incorporate weighting matrices and regularization terms inspired by techniques from Claude Shannon-inspired information theory and convex optimization research at IBM Research.

Algorithms and variants

Algorithms for obtaining MDS solutions include classical closed-form eigenvalue methods, iterative numerical optimization, and stochastic techniques. Classical MDS uses eigen-decomposition methods related to work by Alston Householder and computational packages developed at AT&T, while metric MDS employs gradient-based solvers from optimization theory advanced at Courant Institute and Stanford University. Nonmetric MDS is commonly solved with monotone regression and majorization algorithms such as SMACOF (Scaling by MAjorizing a COmplicated Function), whose algorithmic foundations relate to work at Bell Labs and University of Groningen. Other variants include Sammon mapping, Isomap influenced by manifold learning research from Bernhard Schölkopf and Joshua Tenenbaum, locally linear embedding (LLE) developed by researchers at MIT, and t-distributed stochastic neighbor embedding (t-SNE) introduced by Laurens van der Maaten and Geoffrey Hinton; these approaches share goals with MDS but differ in cost functions and neighborhood definitions. Large-scale and sparse variants exploit techniques from Google-scale computation and randomized linear algebra advances at Microsoft Research.

Applications

MDS has a long record of applications across domains: in psychometrics for perceptual mapping of stimuli in experiments by researchers at University College London and Harvard University; in marketing research and brand mapping used by consultancies and marketing groups at McKinsey & Company and Nielsen; in ecology and bioinformatics for ordination of species and gene expression data used in projects at Broad Institute and European Bioinformatics Institute; in geography and cartography for reconstruction of spatial layouts from dissimilarity information, techniques used by teams at Ordnance Survey and National Geographic Society; and in archaeology and cognitive science to study artifact similarity and mental representation, with researchers at University of Cambridge and Princeton University employing MDS. It also appears in machine learning pipelines at Google Research, Facebook AI Research, and in neuroscience labs at Johns Hopkins University for visualizing neural population activity.

Evaluation and goodness-of-fit

Goodness-of-fit for MDS solutions is assessed with stress measures such as Kruskal's stress, Sammon's error, and explained variance analogous to eigenvalue spectra in classical MDS; these criteria were developed in statistical traditions tied to Princeton University and Columbia University. Shepard diagrams, plotting fitted distances versus observed dissimilarities, are used for diagnostic checks in research groups at Yale University and University of California, Berkeley. Model selection between dimensionalities often uses cross-validation and information criteria influenced by work from Hastie, Tibshirani, and Friedman and model-complexity controls explored at Carnegie Mellon University.

Practical considerations and implementation

Implementations of MDS are available in statistical software ecosystems maintained by teams at RStudio and The Python Software Foundation; notable packages include implementations inspired by algorithms from Bell Labs and optimization libraries from Netlib. Practical concerns include handling missing data, weighting dissimilarities, choosing dimensionality, and avoiding local minima—issues addressed with initialization heuristics from Numerical Recipes and scalable solvers developed at National Institute of Standards and Technology and Los Alamos National Laboratory. Visualization of MDS outputs integrates with tools from Tableau Software, Matplotlib, and D3.js for dissemination in academic venues such as conferences hosted by NeurIPS and ICML.

Category:Data visualization