New Ways of Analyzing Variation

New Ways of Analyzing Variation
Name	New Ways of Analyzing Variation
Field	Statistics; Computational Biology; Data Science
Introduced	21st century

Contents

New Ways of Analyzing Variation explores contemporary methods for quantifying, modeling, and visualizing variation across systems, integrating statistical theory, computational algorithms, and domain-specific applications. The topic connects advances from institutions such as Massachusetts Institute of Technology, Stanford University, Harvard University, University of Cambridge, University of Oxford to applied research at National Institutes of Health, European Molecular Biology Laboratory, Wellcome Trust, Howard Hughes Medical Institute, and industrial labs like Google, Microsoft, Amazon (company), IBM. It synthesizes contributions from scholars associated with Isaac Newton Institute, Alan Turing Institute, Santa Fe Institute, Cold Spring Harbor Laboratory, and Max Planck Society.

Introduction

Foundational mathematical frameworks draw on work associated with Pierre-Simon Laplace, Thomas Bayes, Andrey Kolmogorov, Emil Artin, Srinivasa Ramanujan, Élie Cartan and formalize variance, covariance, and higher moments using ideas propagated through Fisher–Neyman theory and developments at Bell Labs, Princeton University, University of Chicago. Contemporary models integrate hierarchical Bayes techniques from Harvard, shrinkage and resampling methods from Stanford University and University of California, Berkeley, penalized likelihood approaches linked to Tibshirani and Hastie, and nonparametric inference inspired by Kolmogorov–Smirnov and Andrey Markov-related processes. Work on mixed-effects models connects to applications at Centers for Disease Control and Prevention, World Health Organization, and uses estimators developed in the lineage of C.R. Rao, Paul Lévy, Norbert Wiener.

Algorithmic advances are driven by techniques emerging from Google DeepMind, OpenAI, Facebook AI Research, Microsoft Research, IBM Research and academic groups at Carnegie Mellon University, California Institute of Technology, University of Toronto, ETH Zurich. Methods include ensemble models following ideas from Leo Breiman and Yann LeCun, deep generative models influenced by Ian Goodfellow and Diederik Kingma, kernel methods informed by Bernhard Schölkopf and Vladimir Vapnik, plus scalable optimization developed at Courant Institute and INRIA. Distributed computing and big-data frameworks used for variation analysis reference infrastructures like Apache Hadoop, Apache Spark, and high-performance resources at Argonne National Laboratory and Lawrence Berkeley National Laboratory.

Novel analyses of variation in genetics use techniques from projects at Broad Institute, Wellcome Trust Sanger Institute, National Human Genome Research Institute, European Molecular Biology Laboratory, and link to theoretical frameworks by Sewall Wright, J.B.S. Haldane, Theodosius Dobzhansky, Motoo Kimura, Stephen Jay Gould. Methods for population structure, selection scans, and genotype–phenotype mapping draw on tools developed in collaboration with 1000 Genomes Project, UK Biobank, Genome Aggregation Database, and analytic pipelines used at Cold Spring Harbor Laboratory. Phylogenetic and coalescent models relate to work at Smithsonian Institution and Natural History Museum, London.

Visualization practices build on legacies from John Tukey, Edward Tufte, Ben Shneiderman, Stuart K. Card and draw on software ecosystems developed at R Project for Statistical Computing, Python (programming language), NumPy, SciPy, Pandas (software) and libraries from Matplotlib, ggplot2, D3.js, Tableau (software). Interactive dashboards and reproducible workflows reference standards promoted by Journal of the American Statistical Association, Nature Methods, PLOS Computational Biology, and platforms like GitHub and Zenodo.

Category:Statistical methods