1001 Genomes Project

1001 Genomes Project
Name	1001 Genomes Project
Organism	Arabidopsis thaliana
Status	Completed
Start	2008
End	2016
Collaborators	European Molecular Biology Laboratory; Max Planck Society; Wellcome Trust Sanger Institute

Contents

Overview
History and Objectives
Methodology and Data Collection
Key Findings and Impact
Data Access and Resources
Collaborative Networks and Funding

1001 Genomes Project The 1001 Genomes Project was an international sequencing effort to catalogue natural variation in Arabidopsis thaliana. The initiative linked major institutions such as the European Molecular Biology Laboratory, the Wellcome Trust Sanger Institute, and the Max Planck Society, integrating technologies from Illumina, Pacific Biosciences, and the European Nucleotide Archive. The project produced population-scale genomic resources used by researchers at Harvard University, the University of California, and the National Center for Biotechnology Information.

Overview

The project generated whole-genome sequences for hundreds of accessions of Arabidopsis thaliana, enabling comparative analyses by groups at the University of Oxford, University of Cambridge, and ETH Zurich while informing studies at the John Innes Centre, Rothamsted Research, and the Max Planck Institute for Developmental Biology. Data contributed to analyses by teams associated with the Broad Institute, Cold Spring Harbor Laboratory, and the Smithsonian Institution, and were integrated into databases used by the National Institutes of Health and the European Bioinformatics Institute. Results influenced plant biology work at the Sanger Institute and experimental follow-ups at Kyoto University, Stanford University, and the University of Toronto.

History and Objectives

Initiated after discussions involving scientists from the Royal Society, the European Research Council, and funding bodies such as the Wellcome Trust and the German Research Foundation, the project aimed to extend insights from the Arabidopsis Genome Initiative and to complement studies by the 1000 Genomes Project and the Human Genome Project. Principal investigators convened at meetings in Paris, Berlin, and Stockholm with participants from the University of Helsinki, University of Barcelona, and Wageningen University to set aims: document sequence polymorphism, map structural variation, and connect genotypes to phenotypes, facilitating work by researchers at Princeton University, Yale University, and Columbia University.

Methodology and Data Collection

Sampling strategies drew on collections curated by herbaria at the Royal Botanic Gardens, Kew, the Natural History Museum, and the New York Botanical Garden, with seeds provided by the Arabidopsis Biological Resource Center and the European Arabidopsis Stock Centre. Sequencing pipelines used Illumina HiSeq platforms and long-read data from PacBio instruments at the Wellcome Trust Sanger Institute, with alignment and variant calling performed using tools developed at the Broad Institute, EMBL-EBI, and the European Molecular Biology Laboratory. Bioinformatic workflows incorporated software from the Genome Analysis Toolkit community, contributions from the OpenSNP community, and comparative frameworks employed by groups at the Max Planck Institute for Evolutionary Biology and the University of Groningen.

Key Findings and Impact

Analyses revealed patterns of nucleotide diversity, linkage disequilibrium, and local adaptation that reshaped interpretations in evolutionary biology familiar from studies by Ernst Mayr, Theodosius Dobzhansky, and Motoo Kimura, and informed ecological genetics work at the University of California, Davis, and the University of Florida. Discovery of copy-number variation and structural rearrangements paralleled findings in maize research at Iowa State University and rice studies at the International Rice Research Institute, while insights into flowering-time genes linked to experiments by the John Innes Centre and the Salk Institute. The data underpinned genome-wide association studies led by teams at the University of Chicago, ETH Zurich, and the Max Planck Institute, and influenced breeding-related research at Corteva Agriscience and the International Maize and Wheat Improvement Center.

Data Access and Resources

Sequencing reads and assemblies were deposited in repositories maintained by the European Nucleotide Archive, the Sequence Read Archive at the National Center for Biotechnology Information, and data portals run by the European Bioinformatics Institute and the Wellcome Trust Sanger Institute. Community tools and browsers were developed by contributors from the Broad Institute, Ensembl Plants, Gramene, and UCSC Genome Browser teams, enabling integration with datasets from the 1000 Genomes Project, the ENCODE Project, and the Plant Ontology Consortium. Training materials and workshops were offered in collaboration with EMBL-EBI, Cold Spring Harbor Laboratory, and the International Society for Computational Biology.

Collaborative Networks and Funding

The consortium included researchers affiliated with institutions such as the Max Planck Society, the Wellcome Trust Sanger Institute, the European Molecular Biology Laboratory, the John Innes Centre, and universities across Europe, North America, and Asia, and operated with support from funders including the Wellcome Trust, the European Research Council, the German Research Foundation, and national science foundations. Collaborative links extended to repositories and initiatives like the Arabidopsis Biological Resource Center, the European Arabidopsis Stock Centre, the International Maize and Wheat Improvement Center, and global networks convened by the Royal Society and the National Science Foundation.

Category:Genomics projects Category:Arabidopsis thaliana Category:Bioinformatics