gnomAD — LLMpedia

gnomAD
Name	gnomAD
Title	Genome Aggregation Database
Creator	Broad Institute, Karolinska Institutet, Wellcome Sanger Institute
Launched	2016
Latest release	v3.1

Contents

Overview
Data collection and processing
Variant annotation and quality control
Population structure and allele frequencies
Applications in research and clinical genetics
Limitations and ethical considerations

gnomAD is a large-scale human genetic variation resource aggregating exome and genome sequencing data from numerous consortia and cohorts to provide population allele frequencies and variant annotations. It was developed by the Broad Institute in collaboration with institutions including the Wellcome Sanger Institute, Massachusetts General Hospital, and Karolinska Institutet, integrating data contributed by projects such as the 1000 Genomes Project, Exome Aggregation Consortium, and disease-focused studies like those at St. Jude Children's Research Hospital. gnomAD informs variant interpretation in clinical genetics, population genetics, and genomic medicine by offering allele frequency benchmarks and aggregated metrics for millions of variants.

Overview

gnomAD aggregates aggregated human sequencing data from tens to hundreds of thousands of individuals contributed by academic centers and consortia, including the 1000 Genomes Project, ExAC, UK Biobank, TOPMed, and disease-focused groups at institutions like Broad Institute, Wellcome Sanger Institute, and Massachusetts General Hospital. The project produces public releases (for example v2 and v3) that supply allele frequency tables, site-level metrics, and sample-level metadata used across clinical laboratories such as Mayo Clinic, GeneDx, and research centers including Harvard Medical School, UCSF, and Stanford University. gnomAD interacts with international standards and resources such as the American College of Medical Genetics and Genomics, the Human Genome Variation Society, and variant repositories like ClinVar and dbSNP.

Data collection and processing

Data for gnomAD are obtained from sequencing centers, clinical programs, and population cohorts including 1000 Genomes Project, UK Biobank, TOPMed, and specialty registries at institutions like Broad Institute, Wellcome Sanger Institute, and Karolinska Institutet. Raw data processing uses pipelines and tools developed by groups associated with projects such as GATK authors at the Broad Institute and aligners created by teams at EBI and Genome Research Limited. Processing steps reference best practices advocated by bodies like the American College of Medical Genetics and Genomics and implementers from Massachusetts General Hospital and Stanford University. Data harmonization integrates sample metadata from partner projects including ExAC, 1000 Genomes Project, UK Biobank, and disease cohorts at St. Jude Children's Research Hospital and Children's Hospital of Philadelphia.

Variant annotation and quality control

Variant annotation pipelines leverage external resources and tools developed by groups at Ensembl, UCSC Genome Browser, and the Human Genome Project, applying consequence predictors from teams behind SIFT, PolyPhen-2, and annotation frameworks like VEP and ANNOVAR. Quality control metrics were established by collaborators across institutions such as Broad Institute, Wellcome Sanger Institute, Massachusetts General Hospital, and projects like ExAC and TOPMed. Filters and flags draw on allele-context analyses used by researchers at Harvard Medical School, Stanford University, and the Sanger Institute, while curated site lists intersect with clinical assertions in ClinVar and variant catalogs like dbSNP. Sample- and site-level QC consider batch effects reported by consortia including 1000 Genomes Project, ExAC, and cohort studies run by Mount Sinai Hospital and Vanderbilt University Medical Center.

Population structure and allele frequencies

gnomAD reports allele frequencies stratified by continental and subcontinental groupings derived from contributors such as 1000 Genomes Project, UK Biobank, and regional studies from institutions including Karolinska Institutet, Wellcome Sanger Institute, and Broad Institute. Population labels in releases reference ancestry inference work by groups at Harvard Medical School, Stanford University, and the University of California, Los Angeles. Frequency data are routinely compared against disease cohort observations from clinics like Mayo Clinic and diagnostic labs such as GeneDx and Ambry Genetics. Analyses of population structure build on methods pioneered in studies associated with Human Genome Diversity Project, Simons Genome Diversity Project, and consortia like H3Africa.

Applications in research and clinical genetics

Researchers at universities and institutes including Harvard Medical School, Broad Institute, Stanford University, Wellcome Sanger Institute, and Massachusetts General Hospital use gnomAD to filter rare variants in studies of disorders investigated at centers such as St. Jude Children's Research Hospital, Children's Hospital of Philadelphia, and The Rockefeller University. Clinical laboratories guided by professional bodies like the American College of Medical Genetics and Genomics use gnomAD allele frequencies to support pathogenicity assessments in collaboration with variant curators at ClinGen and submitters to ClinVar. Population genetics studies referencing datasets from 1000 Genomes Project, UK Biobank, and TOPMed combine gnomAD frequencies with selection analyses from groups at Stanford University, Princeton University, and University of California, Berkeley. Drug target validation efforts at pharmaceutical companies and translational centers such as Pfizer, Novartis, Genentech, and Broad Institute use gnomAD to assess human constraint and loss-of-function intolerance.

Limitations and ethical considerations

Limitations noted by investigators at contributing institutions including Broad Institute, Wellcome Sanger Institute, and Karolinska Institutet include underrepresentation of many global populations compared to cohorts like 1000 Genomes Project and H3Africa, ascertainment biases shared with biobanks such as UK Biobank, and technical differences documented by groups at Broad Institute and Sanger Institute. Ethical and governance considerations have been discussed in forums involving organizations like NHGRI, Wellcome Trust, European Genome-phenome Archive, and community projects including H3Africa and indigenous research partners, addressing consent models used by contributors such as Massachusetts General Hospital and regional hospitals. Data-sharing policies reflect standards from bodies like the Global Alliance for Genomics and Health and emphasize responsible use in clinical settings guided by American College of Medical Genetics and Genomics recommendations.

Category:Genetic databases