ANCOM — LLMpedia

ANCOM
Name	ANCOM
Developer	Multiple research groups
Initial release	2012
Latest release	2020s
Programming languages	R, Python, C++
Operating system	Cross-platform
Genre	Statistical method for compositional data
License	Varies (open-source implementations)

Contents

Introduction
Background and Motivation
Methodology
Variants and Extensions
Applications
Limitations and Criticisms
Software Implementations and Usage

ANCOM

ANCOM is a statistical method developed for differential abundance analysis of compositional microbiome data. It was introduced to address biases introduced by relative abundance measures in high-throughput sequencing studies, offering a hypothesis-testing framework that uses log-ratio transformations and multiple pairwise comparisons to identify taxa associated with experimental covariates. The method has been discussed and extended in the contexts of microbial ecology, metagenomics, and biostatistics by researchers and institutions working on sequencing-based profiling.

Introduction

ANCOM was first presented in the context of marker-gene surveys and shotgun metagenomics, responding to challenges identified in studies produced by groups affiliated with Human Microbiome Project, National Institutes of Health, European Molecular Biology Laboratory, Wellcome Trust Sanger Institute, and academic laboratories at universities such as Stanford University, Harvard University, University of California, San Diego, Massachusetts Institute of Technology, and University of Washington. The approach emphasizes the compositional nature of sequencing data and builds on concepts related to log-ratio analysis developed in the statistical literature by researchers at institutions like Johns Hopkins University and Imperial College London. Early adopters included teams publishing in venues such as Nature, Nature Methods, PLoS Biology, ISME Journal, and Genome Research.

Background and Motivation

The motivation for ANCOM arose from problems highlighted in comparative studies produced by groups working with data from projects like American Gut Project, MetaHIT, and clinical cohorts at Fred Hutchinson Cancer Research Center and Mayo Clinic. Sequencing yields relative counts constrained by library size and compositional effects, a point underscored by earlier methodological critiques from authors affiliated with Rothamsted Research and Cornell University. Classical methods such as those used in DESeq2 and edgeR—developed by teams at EMBL-EBI and Children's Hospital of Philadelphia—are powerful for transcriptomics but may give misleading results for compositional microbiome profiles; ANCOM sought to provide an alternative grounded in Aitchison's log-ratio paradigm and related work by statisticians at University of Copenhagen and Ecole Polytechnique Fédérale de Lausanne.

Methodology

ANCOM evaluates relative abundance differences by transforming count data into pairwise log-ratios between taxa, then testing whether the majority of these ratios for a taxon are associated with the condition of interest. The method leverages statistical techniques related to multiple testing corrections and nonparametric hypothesis testing used in studies from Columbia University, Yale University, and University of Michigan. Core steps involve data preprocessing common to pipelines produced by QIIME, mothur, and DADA2 teams; computation of pairwise log-ratios similar to approaches discussed by researchers at University of Florida and University of Colorado; and decision rules that control false discovery informed by work from Stanford University's Department of Statistics and University of California, Berkeley. ANCOM’s original algorithm includes handling of zeros through imputation or pseudo-count addition, reflecting practices found in analyses by groups at University of British Columbia and McMaster University.

Variants and Extensions

Several variants and extensions of ANCOM have been proposed by collaborators and independent groups at institutions like Argonne National Laboratory, University of Illinois Urbana-Champaign, Max Planck Institute for Biology, and Johns Hopkins Bloomberg School of Public Health. Notable extensions include versions that incorporate mixed-effects models to account for repeated measures (used in longitudinal cohorts at Karolinska Institutet and University of Oxford), adaptations that integrate phylogenetic information similar to methods from University of Chicago and University of California, Davis, and robustified implementations that adjust for sampling depth and sparsity inspired by work at Princeton University and University of Toronto. Comparative evaluations have been published in outlets such as Microbiome and Frontiers in Microbiology.

Applications

ANCOM and its derivatives have been used across studies involving human cohorts at Johns Hopkins University School of Medicine, Columbia University Irving Medical Center, UCLA, and Mount Sinai Health System, as well as environmental surveys conducted by teams at Scripps Institution of Oceanography, Woods Hole Oceanographic Institution, and USDA. Applications include investigations of gut microbiota in disease cohorts studied at Mayo Clinic and Cleveland Clinic, soil microbial community comparisons in agricultural research at Iowa State University, and bioreactor community dynamics in industrial microbiology projects with collaborators at ETH Zurich and Nanyang Technological University.

Limitations and Criticisms

Critiques of ANCOM have been voiced in methodological comparisons led by researchers at University of Texas Southwestern Medical Center and University College London, noting sensitivity to sparsity, dependence on pseudo-count choices described by statisticians at Rice University, and challenges in settings with many rare taxa similar to observations from University of Pittsburgh studies. Other discussions from groups at National University of Singapore and Monash University emphasize trade-offs between sensitivity and control of false positives, and the behavior of pairwise log-ratio tests under violations of modeling assumptions examined by investigators at Carnegie Mellon University.

Software Implementations and Usage

Open-source implementations of ANCOM exist in packages developed in programming languages promoted by communities at The R Project for Statistical Computing, Python Software Foundation, and contributors from repositories hosted by organizations like GitHub and Bioconductor. Implementations have been integrated into analysis workflows using tools from QIIME 2, Bioconductor packages maintained by groups at Fred Hutch and Roswell Park Comprehensive Cancer Center, and interactive notebooks produced by teams at Broad Institute. Documentation, tutorials, and benchmarking scripts have been produced by academic labs at University of Helsinki, Indian Institute of Science, and University of São Paulo.

Category:Statistical methods