Exome Aggregation Consortium

Exome Aggregation Consortium
Name	Exome Aggregation Consortium
Abbreviation	ExAC
Formation	2014
Founders	Mark J. Daly, Daniel G. MacArthur, Matthew E. Hurles
Purpose	Aggregation of human exome sequencing data for allele frequency reference
Location	International

Contents

Background and Objectives
Data Collection and Composition
Methods and Quality Control
Key Findings and Publications
Impact on Clinical and Research Genomics
Limitations and Ethical Considerations

Exome Aggregation Consortium The Exome Aggregation Consortium produced a large reference database of human protein-coding variation derived from aggregated exome sequencing data, assembled to improve interpretation of rare variants in human disease. The project united academic, clinical, and industrial groups to enable allele frequency benchmarking across diverse populations and to inform studies in genetics, genomics, and clinical human genetics.

Background and Objectives

ExAC was initiated to address challenges in variant interpretation identified by investigators at institutions such as Wellcome Sanger Institute, Broad Institute, Massachusetts General Hospital, Harvard Medical School, and University of Cambridge, and by research consortia including 1000 Genomes Project, UK10K, Genome of the Netherlands, and Icelandic deCODE genetics. The consortium aimed to aggregate exome data from sequencing cohorts assembled by groups like Children's Hospital of Philadelphia, NIH, University of California, San Francisco, Vanderbilt University, Yale University, McGill University Health Centre, Karolinska Institutet, and Genome Institute at Washington University to produce a high-quality allele frequency resource to assist variant curation in clinical pipelines used by organizations such as ClinVar, American College of Medical Genetics and Genomics, and diagnostic laboratories at Mayo Clinic and GeneDx.

Data Collection and Composition

ExAC compiled exome sequencing data from tens of thousands of individuals contributed by research studies and biobanks including Icelandic deCODE genetics, DECIPHER, Simons Foundation Autism Research Initiative, Autism Speaks, Broad Institute's sequencing centers, UK Biobank (exome subset), Framingham Heart Study, COPDGene, NHLBI TOPMed cohorts, and numerous disease-focused consortia at institutions such as Mount Sinai, Johns Hopkins University, and Stanford University. The dataset comprised samples annotated with population labels referencing groups such as individuals of European (non-Finnish), Finnish, African/African American, East Asian, South Asian, and Latino ancestry, and excluded samples from studies of severe pediatric disease liaising with groups like Deciphering Developmental Disorders (DDD). Contributors included clinicians and investigators from Baylor College of Medicine, Columbia University Medical Center, University of Michigan, University of Oxford, and Emory University.

Methods and Quality Control

ExAC employed centralized processing pipelines informed by tools and frameworks developed at Broad Institute and collaborators, integrating variant calling algorithms such as GATK’s HaplotypeCaller and joint genotyping strategies used in projects like 1000 Genomes Project. Quality control workflows adapted methodologies from Genome Analysis Toolkit, Picard, and best practices from sequencing centers at Wellcome Sanger Institute. Principal component analysis and ancestry inference referenced datasets from HapMap and 1000 Genomes Project to control population stratification; relatedness filtering used methods practiced in studies at Massachusetts General Hospital and Broad Institute. Site-level filters, depth and genotype quality thresholds, and per-exome metrics were benchmarked against reference samples from NA12878 sequenced at centers such as Illumina’s facilities and evaluated in comparison to variant catalogs from dbSNP and ClinVar.

Key Findings and Publications

Primary publications from ExAC, authored by investigators including Daniel G. MacArthur and Mark J. Daly, reported discovery of widespread rare protein-truncating variants, revised estimates of population allele frequencies, and reclassification of purportedly pathogenic variants documented in databases like HGMD and ClinVar. The ExAC papers demonstrated the prevalence of heterozygous loss-of-function variants in genes cataloged by projects such as OMIM and challenged assertions from specific case reports in journals like Nature, Science, and The New England Journal of Medicine. Follow-up studies integrated ExAC frequencies into variant interpretation workflows at clinical sites including Mayo Clinic and Johns Hopkins, and influenced guideline updates by American College of Medical Genetics and Genomics authors. Subsequent derivative resources such as gnomAD expanded on ExAC’s published methods and datasets in peer-reviewed articles that cited ExAC findings.

Impact on Clinical and Research Genomics

ExAC altered diagnostic practice at hospitals and laboratories including Mayo Clinic, Children's Hospital of Philadelphia, Boston Children's Hospital, Sheffield Children's Hospital, Great Ormond Street Hospital, and commercial providers such as Invitae and GeneDx by providing allele frequency thresholds to adjudicate variant pathogenicity. Researchers at institutions such as Broad Institute, Wellcome Sanger Institute, Stanford University School of Medicine, Cambridge University Hospitals NHS Foundation Trust, and consortia like Deciphering Developmental Disorders used ExAC to prioritize candidate genes in studies of rare disorders, cancer predisposition studies at Memorial Sloan Kettering Cancer Center, and population genetics analyses undertaken at University of California, Berkeley and Princeton University. ExAC influenced bioinformatics tools and annotation services developed by groups at Ensembl, UCSC Genome Browser, Annovar authors, and commercial vendors including Illumina.

Limitations and Ethical Considerations

ExAC had limitations highlighted by ethicists and geneticists from centers such as Broad Institute, Wellcome Sanger Institute, Harvard Medical School, and Oxford University: uneven global representation compared to datasets from Icelandic deCODE genetics and regional biobanks like UK Biobank, population stratification challenges noted in analyses at 1000 Genomes Project, and constraints arising from consent and data-use policies negotiated with institutions including NIH, Wellcome Trust, and participating university review boards. Ethical debates engaged stakeholders from American College of Medical Genetics and Genomics, research funders such as Wellcome Trust and NIH, and patient advocacy groups like Autism Speaks regarding return of results, privacy, and recontact. Technical caveats included limitations in structural variant detection compared to whole-genome projects at TOPMed and variant calling differences across sequencing centers at Broad Institute and Wellcome Sanger Institute.

Category:Genomics