1001 Genomes Project for Arabidopsis thaliana

1001 Genomes Project for Arabidopsis thaliana
Name	1001 Genomes Project for Arabidopsis thaliana
Start	2008
Organism	Arabidopsis thaliana
Coordinating institutions	European Molecular Biology Laboratory, Max Planck Society, Wellcome Trust Sanger Institute
Data release	2016

Contents

Background
Objectives and Scope
Methods and Data Collection
Major Findings and Impact
Data Access and Resources
Collaborations and Funding

1001 Genomes Project for Arabidopsis thaliana was a large-scale genomic survey that catalogued natural variation in the model plant Arabidopsis thaliana across global collections, integrating high-throughput sequencing with population genetics and functional genomics. The initiative brought together research groups from institutions such as the European Molecular Biology Laboratory, the Wellcome Trust Sanger Institute, the Max Planck Society, and national botanical collections to generate a dense map of single-nucleotide polymorphisms and structural variants for use by the plant science community.

Background

The project originated in the context of model organism genomics following the completion of the Arabidopsis thaliana genome and built on prior efforts like the Arabidopsis 1001 Genomes Consortium precursor studies and regional surveys led by groups at the University of Oxford, the University of Cambridge, and the John Innes Centre. Influences included population-scale sequencing projects such as the Human Genome Project, the 1000 Genomes Project (human), and plant-focused consortia at the European Bioinformatics Institute and the National Center for Biotechnology Information. The initiative responded to calls from communities represented at meetings of the International Conference on Arabidopsis Research and workshops organized by the Gordon Research Conferences to move from single-reference frameworks toward pan-genome perspectives championed by researchers at the Carnegie Institution for Science and the Max Planck Institute for Plant Breeding Research.

Objectives and Scope

Primary objectives were to sample global natural variation, map genetic diversity underlying adaptive traits, and provide publicly accessible variant catalogs to researchers at the Sainsbury Laboratory, the Institut National de la Recherche Agronomique, and universities such as Harvard University and the University of California, Berkeley. The scope encompassed whole-genome resequencing of accessions assembled from repositories including the Arabidopsis Biological Resource Center, the European Arabidopsis Stock Centre (NASC), and the Kew Royal Botanic Gardens, aiming to inform studies by groups at the Max Planck Institute for Developmental Biology and the John Innes Centre on flowering time, stress responses, and developmental genetics.

Methods and Data Collection

Sampling pipelines combined field collections from regions like Iberian Peninsula, Sweden, and North Africa with curated strains from herbaria at the Natural History Museum, London and the Muséum national d'Histoire naturelle (Paris), coordinated by teams at the Royal Botanic Gardens, Kew. Sequencing strategies employed platforms developed by companies such as Illumina and analytical workflows running on infrastructure at the European Bioinformatics Institute, the Wellcome Trust Sanger Institute, and the National Institutes of Health. Variant calling and assembly methods adopted algorithms referenced in publications from groups at the Broad Institute, the Max Planck Institute for Biology, and the University of Tokyo, integrating read mapping, de novo assembly, and structural variant detection used by researchers affiliated with the European Molecular Biology Laboratory and the John Innes Centre.

Major Findings and Impact

The project reported extensive nucleotide diversity, local adaptation signatures, and recurrent structural variation informing trait mapping efforts by laboratories at Cold Spring Harbor Laboratory, the Sainsbury Laboratory, and the ETH Zurich. Results influenced quantitative genetics studies at Stanford University, gene-editing experiments at the Max Planck Institute for Chemical Ecology, and ecological genomics work by teams at the Smithsonian Institution and the University of California, Davis. Discovery of geographically restricted alleles altered interpretations in landmark studies led by investigators from the University of Chicago, the University of Utrecht, and the University of Geneva, while pan-genome discussions stimulated follow-up initiatives at the European Molecular Biology Laboratory and the Sanger Institute examining structural variation and presence–absence polymorphisms.

Data Access and Resources

Sequence reads, variant catalogs, and metadata were deposited in public repositories managed by the European Nucleotide Archive, the National Center for Biotechnology Information, and the DNA Data Bank of Japan, with cloud-enabled access provided through platforms coordinated by the European Bioinformatics Institute and the Galaxy Project. Analysis-ready datasets and browser tools adopted by users at the Arabidopsis Biological Resource Center and the TAIR (The Arabidopsis Information Resource) enabled community use by researchers at the Max Planck Society, the John Innes Centre, and numerous university groups worldwide.

Collaborations and Funding

The consortium comprised contributors from academic institutions such as the University of Oxford, the University of Cambridge, the John Innes Centre, and research institutes including the European Molecular Biology Laboratory and the Wellcome Trust Sanger Institute, with project funding provided by agencies and foundations like the Wellcome Trust, the European Research Council, national research councils including UK Research and Innovation, and collaborative support from the Max Planck Society and regional botanical collections. Collaborative governance drew on models used by the Human Genome Project and partnerships exemplified by the International Rice Informatics Consortium, enabling coordinated data standards and community engagement across continents.

Category:Genomics projects Category:Arabidopsis thaliana