LLMpediaThe first transparent, open encyclopedia generated by LLMs

1000 Genomes Project

Generated by GPT-5-mini
Note: This article was automatically generated by a large language model (LLM) from purely parametric knowledge (no retrieval). It may contain inaccuracies or hallucinations. This encyclopedia is part of a research project currently under review.
Article Genealogy
Parent: Broad Institute Hop 3
Expansion Funnel Raw 70 → Dedup 14 → NER 11 → Enqueued 9
1. Extracted70
2. After dedup14 (None)
3. After NER11 (None)
Rejected: 3 (not NE: 3)
4. Enqueued9 (None)
Similarity rejected: 4
1000 Genomes Project
1000 Genomes Project
Jthiele (talk) · CC BY-SA 3.0 · source
Name1000 Genomes Project
Start2008
End2015
Participants~2,500
FundingWellcome Trust, National Institutes of Health, Welcome Trust Sanger Institute
Coordinating institutionsInternational Genome Sample Resource, Wellcome Trust Sanger Institute, National Human Genome Research Institute
Genome buildGRCh37

1000 Genomes Project

The 1000 Genomes Project was an international consortium that produced a deep catalog of human genetic variation by sequencing large numbers of genomes from diverse populations. The initiative aimed to create a public reference resource to support studies led by groups such as the International HapMap Project, Human Genome Project, National Institutes of Health, Wellcome Trust, and the European Bioinformatics Institute. Its outcomes informed research across institutions including the Wellcome Trust Sanger Institute, Broad Institute, National Human Genome Research Institute, European Molecular Biology Laboratory, and the Genomics England effort.

Background and objectives

The project emerged amid advances at the Sanger Institute, Broad Institute, Illumina, and Applied Biosystems that made large-scale sequencing feasible. Building on prior initiatives like the Human Genome Project and the International HapMap Project, the consortium sought to catalog common and low-frequency variants across populations sampled from sites including the 1000 Genomes Project Consortium's population centers and partner cohorts from China National Center for Bioinformation, Nigeria, Peru, United Kingdom, and Yoruba people. Objectives emphasized creating a publicly accessible variant map to aid disease-mapping efforts undertaken by groups such as the Wellcome Trust Case Control Consortium, International Cancer Genome Consortium, and clinical genetics centers at the Mayo Clinic and Johns Hopkins University.

Study design and methodology

The design combined low-coverage whole-genome sequencing, high-coverage exome sequencing, and dense genotyping arrays used by groups at the Wellcome Trust Sanger Institute and the Broad Institute. Samples were drawn from worldwide populations including cohorts associated with the HapMap Project and panels from the 1000 Genomes Project Consortium partners in regions like Beijing, Guangzhou, Lima, Accra, and Ibadan. Sequencing platforms from Illumina and ancillary technologies from Life Technologies were employed alongside bioinformatics pipelines developed by teams at the European Bioinformatics Institute, National Center for Biotechnology Information, and the International Genome Sample Resource. Variant-calling strategies combined tools and methods contributed by groups behind GATK, SAMtools, BCFtools, and statistical phasing approaches related to work by investigators affiliated with Stanford University, University of Oxford, and Harvard University.

Data generation and analysis

Data generation produced short-read datasets processed into alignments against the GRCh37 reference sequence using mapping tools influenced by research at the Wellcome Trust Sanger Institute and the Broad Institute. Joint-calling pipelines aggregated variant calls across samples using methods akin to those developed at University of Washington and the European Molecular Biology Laboratory. Downstream analyses assessed single-nucleotide polymorphisms, small insertions and deletions, and structural variants, informed by comparative resources such as the dbSNP database, population catalogs maintained by the International Genome Sample Resource, and annotation frameworks like those at the Ensembl project. Quality control and population-genetic analyses invoked methods and concepts honed in studies from the Max Planck Institute for Evolutionary Anthropology, University of Chicago, and Wellcome Trust Centre for Human Genetics.

Major findings and contributions

The project documented millions of variants, refining allele frequency estimates for common and low-frequency variants across populations studied by collaborators from institutions such as the Wellcome Trust Sanger Institute, Broad Institute, Harvard Medical School, and Stanford University. It revealed patterns of population structure consistent with migrations inferred by researchers at the Max Planck Institute for Evolutionary Anthropology and demographic modeling efforts from teams at the University of California, Berkeley and Princeton University. The catalog improved imputation panels used by genome-wide association studies conducted by groups like the Wellcome Trust Case Control Consortium and influenced variant interpretation pipelines at clinical centers including Mayo Clinic and Massachusetts General Hospital. The resource also advanced discovery of structural variation characterized in parallel by consortia associated with the International Cancer Genome Consortium and the 1000 Genomes Structural Variation Analysis Group.

Applications and impact

The public reference supported association analyses in projects at the Wellcome Trust, NIH, and disease consortia such as the Psychiatric Genomics Consortium and the Alzheimer's Disease Genetics Consortium. Researchers at the Broad Institute, Stanford University, and University of Oxford used the variant frequency data to improve genotype imputation accuracy for cohorts from the Framingham Heart Study, UK Biobank, and population studies in Iceland and Finland. The dataset influenced tool development in groups at Google DeepMind, Illumina, and academic labs at Cambridge University and MIT, and served as a benchmark for later efforts including projects by Genomics England and precision-medicine initiatives at the National Health Service and NIH Clinical Center.

The consortium navigated consent frameworks, data access policies, and privacy considerations influenced by standards developed at the Wellcome Trust, National Institutes of Health, European Bioinformatics Institute, and ethics committees at institutions like Harvard Medical School and the University of Oxford. Data-sharing mechanisms balanced open access with controlled access through archives such as repositories modeled after the European Genome-phenome Archive and governance practices promoted by the Global Alliance for Genomics and Health. The project stimulated debate in forums including meetings at the Royal Society, policy discussions at the World Health Organization, and scholarship from bioethics centers at Georgetown University and Yale University concerning re-identification risks, benefit sharing, and implications for populations represented from locales such as Nigeria, Peru, and China.

Category:Human genetics projects