Earth BioGenome Project

Earth BioGenome Project
Name	Earth BioGenome Project
Caption	Global genomics initiative schematic
Established	2018
Type	International research project

Contents

Background and Goals
Organization and Funding
Methodology and Technologies
Progress and Milestones
Scientific and Conservation Impact
Ethical, Legal, and Social Issues

Earth BioGenome Project is a global scientific initiative to sequence, catalog, and analyze the genomes of all known eukaryotic species on Earth. Launched in 2018, the project brings together a coalition of researchers, institutions, and funders to accelerate biodiversity genomics and inform conservation, agriculture, medicine, and basic biology. It connects large-scale sequencing centers, natural history museums, botanical gardens, and computational consortia to generate high-quality reference genomes for species spanning animals, plants, fungi, and protists.

Background and Goals

The project was proposed by leaders from institutions such as the Wellcome Trust Sanger Institute, Broad Institute, Royal Botanic Gardens, Kew, Smithsonian Institution, California Academy of Sciences, and J. Craig Venter Institute to address goals that include creating reference genomes for representatives of all ~2 million described eukaryotic species, enabling comparative genomics across the tree of life. It aligns with initiatives led by organizations like the GigaScience community, National Center for Biotechnology Information, European Bioinformatics Institute, DNA Zoo, and the Consortium for the Barcode of Life to integrate sequence, specimen, and metadata resources. Founders emphasized synergies with legacy projects such as the Human Genome Project, the 1000 Genomes Project, the Earth Microbiome Project, and the Biodiversity Genomics Initiative to scale up sequencing, annotation, and data sharing. Objectives explicitly include generating chromosome-scale assemblies, linking genomes to voucher specimens curated at institutions like the American Museum of Natural History, Natural History Museum, London, and Muséum national d'Histoire naturelle, and developing standards for sampling, curation, and open data comparable to efforts by the International Nucleotide Sequence Database Collaboration.

Organization and Funding

Governance models involve partnerships among research consortia, national programs such as the National Science Foundation, National Institutes of Health, European Commission, Natural Environment Research Council, and philanthropic funders like the Chan Zuckerberg Initiative and the Gordon and Betty Moore Foundation. Major sequencing hubs include facilities at the Wellcome Sanger Institute, Broad Institute, Baylor College of Medicine Human Genome Sequencing Center, Pacific Biosciences, and Oxford Nanopore Technologies collaborations. Coordination draws on networks like the Global Genome Biodiversity Network, International Barcode of Life Consortium, Global Biodiversity Information Facility, and regional nodes such as the Chinese Academy of Sciences, Australian Museum, South African National Biodiversity Institute, and Instituto Nacional de Biodiversidad (INBio). Fund allocation and partnerships echo funding arrangements used by the Horizon 2020 program, the Bill & Melinda Gates Foundation, and national research councils including the Deutsche Forschungsgemeinschaft.

Methodology and Technologies

The project leverages sequencing platforms and technologies from companies and centers such as Pacific Biosciences, Oxford Nanopore Technologies, Illumina, and novel approaches promoted by groups at the Broad Institute and Wellcome Sanger Institute. Method pipelines incorporate sample collection standards used by the Global Genome Biodiversity Network, specimen vouchering practices from the Smithsonian Institution and Natural History Museum, London, and metadata standards championed by the Global Biodiversity Information Facility and the International Nucleotide Sequence Database Collaboration. Analytical workflows use tools and resources from the Genome Aggregation Database, Ensembl, RefSeq, UniProt, and computational infrastructures like XSEDE, European Open Science Cloud, and cloud providers collaborating with the European Molecular Biology Laboratory. High-contiguity assemblies utilize long-read sequencing, linked-read technologies, chromatin conformation capture protocols such as Hi-C popularized by groups at the Broad Institute and scaffolding methods developed by teams at the Wellcome Sanger Institute. Annotation pipelines integrate expertise from the Gene Ontology Consortium, Pfam, and the Ensembl Genomes team.

Progress and Milestones

Early milestones cited collaborations with the Tree of Life Programme at the Wellcome Sanger Institute, reference genomes published by teams associated with the Vertebrate Genomes Project, and species assemblies released by groups like the Zoonomia Project. Notable outputs include chromosome-scale assemblies for model and non-model organisms sequenced at the Broad Institute, the Baylor College of Medicine, and the Sanger Institute, and linked voucher specimens accessioned at institutions such as the American Museum of Natural History and the Natural History Museum, London. Milestones mirror achievements from historical large-scale efforts including the Human Genome Project, the ENCyclopedia Of DNA Elements project (), and the 1000 Genomes Project. Regional accomplishments have been reported by the Chinese Academy of Sciences, Monash University, University of California, Davis, University of Oxford, and University of British Columbia teams, while community resources have been enriched by contributions from the International Barcode of Life Consortium and the Global Genome Biodiversity Network.

Scientific and Conservation Impact

The initiative enables comparative studies across clades supported by researchers at Harvard University, Stanford University, University of Cambridge, Massachusetts Institute of Technology, and University of California, Berkeley. Outputs feed into conservation planning used by organizations such as the International Union for Conservation of Nature, BirdLife International, World Wildlife Fund, and the Convention on Biological Diversity to prioritize species and habitat protection. Genomic datasets inform agriculture improvements pursued by teams at CIMMYT, International Rice Research Institute, and the Consultative Group on International Agricultural Research, while biomedical insights have emerged through comparative genomics involving centers like the National Institutes of Health and the Wellcome Sanger Institute. Databases and analytical frameworks developed by the Ensembl team, NCBI, and the European Bioinformatics Institute bolster research across ecology, evolution, and systematics with contributions from natural history collections worldwide.

The project navigates complex frameworks including international agreements like the Nagoya Protocol and the Convention on Biological Diversity while engaging stakeholders from indigenous communities, museums such as the Smithsonian Institution, and national authorities including the Ministry of Science and Technology (China). Data sharing policies balance open-access ideals promoted by the International Nucleotide Sequence Database Collaboration and funders like the Wellcome Trust with sovereign access concerns articulated by nations and groups represented at forums such as the Convention on Biological Diversity meetings. Ethical oversight draws on precedents set by the Human Genome Project, policy analyses from the Nuffield Council on Bioethics, and frameworks developed with input from organizations like the World Health Organization, United Nations Educational, Scientific and Cultural Organization, and the Committee on Publication Ethics. Capacity building efforts involve partnerships with regional institutions including the African Union Commission, Latin American Society of Bioinformatics, European Molecular Biology Laboratory, and national research councils to ensure equitable participation and benefit sharing.

Category:Genomics projects