1000 Genomes Project

1000 Genomes Project
AI-generated (Stable Diffusion 3.5) · CC BY 4.0 · source
Name	1000 Genomes Project
Formation	2008
Founder	International Human Genome Sequencing Consortium
Type	International consortium
Purpose	Catalog human genetic variation
Headquarters	Multiple institutes worldwide
Region served	Global
Website	www.internationalgenome.org

Contents

Project overview
Data generation and methodology
Key findings and scientific impact
Data access and usage
Ethical, legal, and social implications

1000 Genomes Project. The 1000 Genomes Project was a landmark international research effort to create the most detailed public catalog of human genetic variation. Launched in 2008, it aimed to sequence the genomes of at least a thousand individuals from diverse populations worldwide. The project's comprehensive dataset has become a foundational resource for biomedical research, enabling studies on the genetic basis of disease, population genetics, and evolutionary biology.

Project overview

The initiative was conceived and launched by a consortium of leading research organizations, including the Wellcome Trust Sanger Institute, the National Human Genome Research Institute, and the Beijing Genomics Institute. Its primary goal was to build upon the first complete human genome sequence produced by the Human Genome Project by characterizing genetic variants present at a frequency of 1% or higher across global populations. The project's steering committee included prominent scientists like Richard Durbin and David Altshuler. It represented a major collaborative effort in genomics, involving dozens of research centers across North America, Europe, East Asia, and Africa.

Data generation and methodology

The project employed a combination of whole-genome sequencing and high-density SNP genotyping across its phased study design. Initial pilot phases focused on trios from the Yoruba people in Ibadan, Nigeria, individuals of Northern European ancestry from Utah, and populations in Tokyo, Japan, and Beijing, China. Later phases expanded significantly, ultimately including 26 distinct populations, such as the Luhya people in Webuye, Kenya, and the Colombian population in Medellín. Sequencing was performed on platforms from Illumina and Complete Genomics, with data harmonization and analysis coordinated at centers like the European Bioinformatics Institute.

Key findings and scientific impact

The final dataset, published in 2015, contained variants from 2,504 individuals, identifying over 88 million genetic variants, including single nucleotide polymorphisms, insertion-deletion polymorphisms, and structural variants. A major finding was that any individual's genome differs from the human reference genome at 4-5 million sites, and rare variants constitute the vast majority of genetic diversity. The resource has been instrumental for genome-wide association studies, helping to fine-map disease loci and interpret non-coding variants. It has provided critical insights into human migration patterns and population history, supporting research published in journals like *Nature* and *Science*.

Data access and usage

All data from the project were released into the public domain without restriction, adhering to the Fort Lauderdale Agreement and Bermuda Principles for genomic data sharing. The primary data repository is hosted at the European Bioinformatics Institute as part of the European Nucleotide Archive. Key datasets are also mirrored and integrated into major bioinformatics resources like the UCSC Genome Browser, the NCBI dbSNP database, and the Ensembl project. This open-access policy has enabled its use in thousands of studies, from investigations into Alzheimer's disease at the Broad Institute to population studies by the Max Planck Institute for Evolutionary Anthropology.

The project operated under strict ethical guidelines developed in consultation with bodies like the National Institutes of Health and the Wellcome Trust. A major consideration was the use of samples collected by prior initiatives such as the International HapMap Project, ensuring informed consent permitted future genomic research. Issues of genetic privacy, data anonymization, and the potential for stigmatization of populations were carefully addressed. The project's governance model, involving the World Medical Association's Declaration of Helsinki, set important precedents for subsequent large-scale genomics efforts like the UK Biobank and the All of Us Research Program.

Category:Human genetics Category:Genomics projects Category:2008 in science

1000 Genomes Project

Project overview

Data generation and methodology

Key findings and scientific impact

Data access and usage

Ethical, legal, and social implications