European Nucleotide Archive

European Nucleotide Archive
Title	European Nucleotide Archive
Country	United Kingdom
Established	1980s
Provider	European Molecular Biology Laboratory – European Bioinformatics Institute
Access	Open
Discipline	Genomics
Formats	Nucleotide sequences, assemblies, raw reads, metadata

Contents

Overview
History and development
Content and data types
Submission and accessioning
Data access and retrieval tools
International collaboration and standards
Governance and funding

European Nucleotide Archive

The European Nucleotide Archive is a central repository for nucleotide sequence data hosted at the European Molecular Biology Laboratory – European Bioinformatics Institute. It supports data deposition and public access to sequences generated by projects such as the Human Genome Project, 1000 Genomes Project, and Earth Microbiome Project, serving researchers affiliated with institutions like the Wellcome Trust, Max Planck Society, and University of Cambridge. Major users include consortia such as the Global Alliance for Genomics and Health, research centers such as Cold Spring Harbor Laboratory and Broad Institute, and initiatives like the International Nucleotide Sequence Database Collaboration.

Overview

The archive aggregates sequence submissions from organizations including the National Institutes of Health, National Center for Biotechnology Information, and DNA Data Bank of Japan, integrating datasets from projects led by Francis Crick Institute, Sanger Institute, European Space Agency, and European Research Council grantees. It catalogs entries tied to model organisms studied at institutions such as Massachusetts Institute of Technology, Harvard University, University of Oxford, University College London, and Karolinska Institutet, and links to resources like Ensembl, UniProt, and PubMed Central. Stakeholders include funders such as Wellcome Trust, Horizon 2020, European Commission, and philanthropic groups like Gordon and Betty Moore Foundation and the Chan Zuckerberg Initiative.

History and development

Origins trace to initiatives at the European Molecular Biology Laboratory and collaborations with National Center for Biotechnology Information and DNA Data Bank of Japan during the Human Genome Project era, concurrent with work by researchers at Cold Spring Harbor Laboratory and Max Planck Society. Expansion occurred alongside projects like 1000 Genomes Project, International HapMap Project, and ENCODE, with technical developments influenced by teams at Broad Institute, Wellcome Sanger Institute, and EMBL-EBI. Governance evolved through engagement with UNESCO, OECD, and the Global Alliance for Genomics and Health, and through standards discussions involving the International Nucleotide Sequence Database Collaboration partners. Major milestones coincided with conferences hosted by EMBL-EBI, meetings at European Commission venues, workshops with the National Human Genome Research Institute, and policy dialogues featuring leaders from the Royal Society and European Research Council.

Content and data types

The archive stores raw reads from platforms produced by Illumina, Pacific Biosciences, and Oxford Nanopore Technologies, as well as assembled genomes, transcriptomes, metagenomes, and marker gene sequences used in studies by Earth Microbiome Project, Human Microbiome Project, and Tara Oceans. Metadata standards reference schemas developed by the Genomic Standards Consortium, Minimum Information about any (x) Sequence working groups, and initiatives from the Research Councils UK. Sample provenance often cites collections from Natural History Museum, Smithsonian Institution, Kew Gardens, and museums associated with University of Copenhagen and University of Helsinki. Linked resources include protein annotations from UniProt, pathway data from Reactome, and variant catalogs from ClinVar and dbSNP.

Submission and accessioning

Submitters include research groups at University of Edinburgh, University of Manchester, Imperial College London, University of Barcelona, and ETH Zurich, often funded by agencies such as UK Research and Innovation, Deutsche Forschungsgemeinschaft, and Agence Nationale de la Recherche. The accessioning workflow parallels practice at National Center for Biotechnology Information and DNA Data Bank of Japan, with accession numbers assigned for BioProject, BioSample, Sequence Read Archive, and GenBank-style records used by projects like 1000 Genomes Project and HapMap. Policies align with open data mandates from Wellcome Trust, European Commission Horizon 2020, and NIH data sharing policies, and are influenced by legal frameworks discussed at institutions such as European Court of Human Rights and Council of Europe meetings.

Data access and retrieval tools

Users access data via web interfaces and programmatic APIs developed at EMBL-EBI, with tools interoperable with Ensembl, UCSC Genome Browser, Galaxy, and IGV developed at Broad Institute and University of California, Santa Cruz. Large-scale retrieval leverages cloud partnerships with Amazon Web Services, Google Cloud Platform, and collaborations with ELIXIR nodes in countries including France, Germany, and Italy. Analysis pipelines reference software from Bioconductor, SAMtools, BCFtools, and BLAST implementations used by researchers at National Institutes of Health and academic groups such as University of Washington. Training resources are provided through EMBL-EBI training, Cold Spring Harbor Laboratory courses, and MOOCs hosted by Coursera and edX.

International collaboration and standards

The archive participates in the International Nucleotide Sequence Database Collaboration alongside National Center for Biotechnology Information and DNA Data Bank of Japan, coordinating with World Health Organization on pathogen data sharing during outbreaks, and aligning standards with Genomic Standards Consortium and Global Alliance for Genomics and Health. Collaborative projects involve institutions such as Institut Pasteur, RIKEN, Chinese Academy of Sciences, and CSIRO, and standards discussions occur in venues like Gordon Research Conferences, Keystone Symposia, and meetings of the Royal Society. Data policies reflect inputs from funders including Wellcome Trust, Bill & Melinda Gates Foundation, and European Commission, and developers collaborate with software projects from EMBL-EBI, Broad Institute, and Genome Institute at Washington University.

Governance and funding

Operational governance is handled by EMBL-EBI with oversight from stakeholders including European Molecular Biology Laboratory member states, Wellcome Trust, European Commission, and national research councils such as UKRI and DFG. Funding sources include Horizon Europe, UK Research and Innovation, Wellcome Trust grants, and institutional support from University of Cambridge and European Molecular Biology Laboratory, with collaborations receiving project funding from Charité–Universitätsmedizin Berlin, Karolinska Institutet, and Max Planck Society. Advisory interactions occur with organizations like OECD, UNESCO, and the Global Alliance for Genomics and Health, while infrastructure partnerships involve cloud providers Amazon Web Services and Google Cloud Platform.

Category:Biological databases