1000 Genomes Project

1000 Genomes Project
AI-generated (Stable Diffusion 3.5) · CC BY 4.0 · source
Name	1000 Genomes Project
Start	2008
End	2015
Participants	~2,500
Funding	Wellcome Trust, National Institutes of Health, Welcome Trust Sanger Institute
Coordinating institutions	International Genome Sample Resource, Wellcome Trust Sanger Institute, National Human Genome Research Institute
Genome build	GRCh37

Contents

1000 Genomes Project

The 1000 Genomes Project was an international consortium that produced a deep catalog of human genetic variation by sequencing large numbers of genomes from diverse populations. The initiative aimed to create a public reference resource to support studies led by groups such as the International HapMap Project, Human Genome Project, National Institutes of Health, Wellcome Trust, and the European Bioinformatics Institute. Its outcomes informed research across institutions including the Wellcome Trust Sanger Institute, Broad Institute, National Human Genome Research Institute, European Molecular Biology Laboratory, and the Genomics England effort.

Background and objectives

The project emerged amid advances at the Sanger Institute, Broad Institute, Illumina, and Applied Biosystems that made large-scale sequencing feasible. Building on prior initiatives like the Human Genome Project and the International HapMap Project, the consortium sought to catalog common and low-frequency variants across populations sampled from sites including the 1000 Genomes Project Consortium's population centers and partner cohorts from China National Center for Bioinformation, Nigeria, Peru, United Kingdom, and Yoruba people. Objectives emphasized creating a publicly accessible variant map to aid disease-mapping efforts undertaken by groups such as the Wellcome Trust Case Control Consortium, International Cancer Genome Consortium, and clinical genetics centers at the Mayo Clinic and Johns Hopkins University.

The design combined low-coverage whole-genome sequencing, high-coverage exome sequencing, and dense genotyping arrays used by groups at the Wellcome Trust Sanger Institute and the Broad Institute. Samples were drawn from worldwide populations including cohorts associated with the HapMap Project and panels from the 1000 Genomes Project Consortium partners in regions like Beijing, Guangzhou, Lima, Accra, and Ibadan. Sequencing platforms from Illumina and ancillary technologies from Life Technologies were employed alongside bioinformatics pipelines developed by teams at the European Bioinformatics Institute, National Center for Biotechnology Information, and the International Genome Sample Resource. Variant-calling strategies combined tools and methods contributed by groups behind GATK, SAMtools, BCFtools, and statistical phasing approaches related to work by investigators affiliated with Stanford University, University of Oxford, and Harvard University.

Data generation produced short-read datasets processed into alignments against the GRCh37 reference sequence using mapping tools influenced by research at the Wellcome Trust Sanger Institute and the Broad Institute. Joint-calling pipelines aggregated variant calls across samples using methods akin to those developed at University of Washington and the European Molecular Biology Laboratory. Downstream analyses assessed single-nucleotide polymorphisms, small insertions and deletions, and structural variants, informed by comparative resources such as the dbSNP database, population catalogs maintained by the International Genome Sample Resource, and annotation frameworks like those at the Ensembl project. Quality control and population-genetic analyses invoked methods and concepts honed in studies from the Max Planck Institute for Evolutionary Anthropology, University of Chicago, and Wellcome Trust Centre for Human Genetics.

The project documented millions of variants, refining allele frequency estimates for common and low-frequency variants across populations studied by collaborators from institutions such as the Wellcome Trust Sanger Institute, Broad Institute, Harvard Medical School, and Stanford University. It revealed patterns of population structure consistent with migrations inferred by researchers at the Max Planck Institute for Evolutionary Anthropology and demographic modeling efforts from teams at the University of California, Berkeley and Princeton University. The catalog improved imputation panels used by genome-wide association studies conducted by groups like the Wellcome Trust Case Control Consortium and influenced variant interpretation pipelines at clinical centers including Mayo Clinic and Massachusetts General Hospital. The resource also advanced discovery of structural variation characterized in parallel by consortia associated with the International Cancer Genome Consortium and the 1000 Genomes Structural Variation Analysis Group.

The public reference supported association analyses in projects at the Wellcome Trust, NIH, and disease consortia such as the Psychiatric Genomics Consortium and the Alzheimer's Disease Genetics Consortium. Researchers at the Broad Institute, Stanford University, and University of Oxford used the variant frequency data to improve genotype imputation accuracy for cohorts from the Framingham Heart Study, UK Biobank, and population studies in Iceland and Finland. The dataset influenced tool development in groups at Google DeepMind, Illumina, and academic labs at Cambridge University and MIT, and served as a benchmark for later efforts including projects by Genomics England and precision-medicine initiatives at the National Health Service and NIH Clinical Center.

The consortium navigated consent frameworks, data access policies, and privacy considerations influenced by standards developed at the Wellcome Trust, National Institutes of Health, European Bioinformatics Institute, and ethics committees at institutions like Harvard Medical School and the University of Oxford. Data-sharing mechanisms balanced open access with controlled access through archives such as repositories modeled after the European Genome-phenome Archive and governance practices promoted by the Global Alliance for Genomics and Health. The project stimulated debate in forums including meetings at the Royal Society, policy discussions at the World Health Organization, and scholarship from bioethics centers at Georgetown University and Yale University concerning re-identification risks, benefit sharing, and implications for populations represented from locales such as Nigeria, Peru, and China.

Category:Human genetics projects