ExAC — LLMpedia

ExAC
Name	ExAC
Full name	Exome Aggregation Consortium dataset
Released	2014–2016
Producers	Broad Institute; Massachusetts General Hospital; Harvard Medical School; Wellcome Trust Sanger Institute
Access	public download and browser
Taxa	humans
Scope	human protein-coding variation
Size	~60,000 exomes (~61,000)

Contents

Background and development
Database composition and methodology
Data access and tools
Applications in clinical and research genomics
Limitations and criticisms
Legacy and successor projects

ExAC The Exome Aggregation Consortium dataset summarized variation from tens of thousands of human protein-coding sequences to provide allele frequency context for medical genetics and population genomics. Developed by collaborators at major biomedical centers, the resource aggregated exome data to improve interpretation of rare variants observed in clinical sequencing and to enable population-scale studies of human genetic diversity. ExAC influenced projects across translational genomics, population genetics, and precision medicine.

Background and development

ExAC arose from a multi-institutional effort involving the Broad Institute, Harvard Medical School, Massachusetts General Hospital, the Wellcome Trust Sanger Institute, and numerous sequencing centers and research groups. Motivated by challenges reported in clinical reports from institutions such as Boston Children's Hospital and initiatives like the 1000 Genomes Project, investigators sought to aggregate exome data generated for diverse studies including disease-focused projects at Stanford University, University of Cambridge, University of Oxford, and the National Institutes of Health. Leadership and analysis teams included researchers affiliated with the NHGRI and consortia that previously collaborated on variant interpretation guidelines with bodies like the American College of Medical Genetics and Genomics. Public release of allele frequency summaries between 2014 and 2016 followed community discussions about data sharing practices exemplified by platforms such as the dbGaP and standards developed by the Global Alliance for Genomics and Health.

Database composition and methodology

ExAC combined exome sequencing data from roughly 60,000 unrelated individuals, with contributors drawn from cohort studies and case-control projects at centers such as Columbia University, University of Washington, Icahn School of Medicine at Mount Sinai, Yale University, and international partners including the Karolinska Institutet and University of Melbourne. Samples were processed through joint genotype calling pipelines implemented at the Broad Institute using tools developed by teams associated with projects like the 1000 Genomes Project and the Genome Aggregation Database predecessor work. Variant calling leveraged established software such as GATK workflows refined in collaboration with groups at Stanford University and quality control steps mirrored practices from sequencing centers at the Wellcome Trust Sanger Institute. Metadata harmonization reconciled cohort labels from disease-focused studies at Fred Hutchinson Cancer Research Center and population cohorts like the UK Biobank pilot datasets. Population-structure analyses relied on principal components approaches used in analyses at University of Chicago and University of California, Los Angeles to assign continental ancestry labels and detect outliers.

Data access and tools

ExAC provided downloadable variant callsets and summary allele frequencies through an online browser developed by engineers and scientists with ties to the Broad Institute and the Wellcome Trust Sanger Institute. The browser adopted visualization patterns familiar from resources such as Ensembl and the UCSC Genome Browser and supported queries by gene and position, exposing annotations derived from tools used by groups at Harvard Medical School and Massachusetts General Hospital. Programmatic access enabled integration into clinical pipelines at institutions like Mayo Clinic and research workflows in laboratories at Cold Spring Harbor Laboratory via APIs patterned after services from the National Center for Biotechnology Information. Documentation and user support referenced variant interpretation frameworks developed by entities such as the American College of Medical Genetics and Genomics and clinical databases like ClinVar.

Applications in clinical and research genomics

Clinicians and researchers used ExAC allele frequencies to filter candidate variants in diagnostic sequencing at hospitals including Cincinnati Children's Hospital Medical Center and specialty centers at Johns Hopkins University. Population geneticists incorporated ExAC into studies of selective constraint, following analytical traditions from work at Harvard and Princeton University, to identify genes under strong purifying selection and to compute metrics employed by projects at MIT and University of California, San Diego. Pharmacogenomics groups at University of Pennsylvania and cancer genetics teams at Memorial Sloan Kettering Cancer Center used ExAC as a reference to interpret variant pathogenicity, while evolutionary biologists at University of California, Berkeley used the dataset to study mutation spectra across ancestries sampled in cohorts affiliated with McMaster University and the University of Toronto.

Limitations and criticisms

Despite its scale, ExAC faced critiques similar to those raised about earlier resources like the 1000 Genomes Project and databases hosted by the National Center for Biotechnology Information. Limitations included uneven ancestral representation—European-ancestry samples were overrepresented relative to cohorts from regions studied by the African Society of Human Genetics and researchers at University of Lagos—and inclusion of individuals from disease-focused studies that complicated assumptions about "healthy" status, a concern also noted in population datasets assembled by teams at Emory University and University of Copenhagen. Methodological criticisms highlighted variant calling differences across contributing centers such as Broad Institute pipelines versus other pipelines and the absence of uniform phenotype metadata, echoing debates from data-sharing discussions at the Global Alliance for Genomics and Health and the National Institutes of Health.

Legacy and successor projects

ExAC directly motivated the creation of larger and more inclusive resources, most notably the Genome Aggregation Database assembled by teams at the Broad Institute in collaboration with partners at Harvard Medical School, Massachusetts General Hospital, and the Wellcome Trust Sanger Institute. Its influence extended to population-sequencing efforts like the UK Biobank expansion, the All of Us Research Program, and international initiatives coordinated with institutions such as the European Bioinformatics Institute and the Chinese Academy of Sciences. ExAC's practices in data aggregation, annotation, and public access shaped variant interpretation workflows used in clinical laboratories at Mayo Clinic, Cleveland Clinic, and academic medical centers worldwide.

Category:Genomic databases