European Genome-phenome Archive

European Genome-phenome Archive
Name	European Genome-phenome Archive
Type	Bioinformatics repository
Established	2008
Location	Hinxton, Cambridgeshire
Coordinates	52.0800°N 0.1770°E
Parent organization	European Molecular Biology Laboratory–European Bioinformatics Institute

Contents

European Genome-phenome Archive

The European Genome-phenome Archive is a secured repository for human genomic and phenotypic data established to enable controlled access to sensitive datasets generated by biomedical projects. It was created to balance data sharing obligations from projects such as the 1000 Genomes Project, International Cancer Genome Consortium, and UK Biobank with privacy protection requirements articulated in instruments like the Declaration of Helsinki and the General Data Protection Regulation. The Archive works closely with institutions including the Wellcome Trust Sanger Institute, European Molecular Biology Laboratory, and national genomic initiatives across France, Germany, Italy, Spain, and Sweden.

Overview

The Archive stores raw and processed sequence data, variant calls, array genotypes, and extensive phenotypic annotations arising from cohorts, case–control studies, and clinical trials led by organizations such as National Institutes of Health, European Research Council, Horizon 2020, and disease-focused consortia like The Cancer Genome Atlas and Alzheimer's Disease Neuroimaging Initiative. It implements controlled-access mechanisms aligned with review procedures used by data access committees associated with funders like the Wellcome Trust and agencies such as the Medical Research Council. The Archive interoperates with infrastructures exemplified by ELIXIR, the Global Alliance for Genomics and Health, and regional nodes including ELIXIR-UK and ELIXIR-NL.

The repository was launched in response to increasing volumes of human genomic data from projects including the 1000 Genomes Project, the Human Genome Project, and national sequencing programs like Genomics England. Early development involved partnerships between the European Bioinformatics Institute and funders such as the Wellcome Trust. Subsequent milestones included integration with cloud compute platforms promoted by vendors like Amazon Web Services and collaborations with initiatives such as dbGaP and the International Nucleotide Sequence Database Collaboration. Governance evolved through interactions with policy bodies including the European Commission and advisory groups comprising stakeholders from European Medicines Agency and academic centers like Harvard Medical School and University of Cambridge.

Content spans whole-genome sequencing from consortia such as the International Cancer Genome Consortium, exome data from clinical initiatives like 100,000 Genomes Project and phenotype-rich cohort data from sources like ALSPAC and Danish National Biobank. Access is managed by data access committees analogous to procedures at dbGaP and guided by participant consent models developed by ethics boards at institutions such as Johns Hopkins University and Karolinska Institutet. Policies reflect legal frameworks including the General Data Protection Regulation and recommendations from advisory bodies such as the Nuffield Council on Bioethics and the European Data Protection Board.

Technical infrastructure is provided by the European Molecular Biology Laboratory–European Bioinformatics Institute facility at Wellcome Genome Campus near Cambridge, with storage, metadata curation, and secure access workflows inspired by platforms like ENA and ArrayExpress. Submitters from research centers including Broad Institute, Sanger Institute, and university hospitals such as Addenbrooke's Hospital follow submission pipelines that require standardized metadata schemas developed in coordination with standards organizations like the Global Alliance for Genomics and Health and ontologies from Human Phenotype Ontology. Compute integration supports analysis via cloud and local compute ecosystems exemplified by Terra, Galaxy, and high-performance clusters at institutions such as EMBL-EBI.

Governance involves stakeholders from funding bodies including the Wellcome Trust, European Commission, and national research councils such as the Medical Research Council. Ethical oversight draws on principles from the Declaration of Helsinki, and legal compliance aligns with the General Data Protection Regulation and national laws in states like United Kingdom, France, and Germany. Data access committees incorporate representatives from clinical centers such as Guy's and St Thomas' NHS Foundation Trust and academic ethics boards at University of Oxford to adjudicate requests, reflecting models used by dbGaP and policy recommendations from the Council of Europe.

Researchers from institutions like University College London, Imperial College London, Stanford University, and Massachusetts Institute of Technology use the Archive to advance studies in oncology, rare disease, pharmacogenomics, and population genetics. The repository has supported discoveries reported in journals such as Nature, Science, The Lancet, and Nature Genetics by enabling secondary analysis of cohorts from projects like ALSPAC, UK Biobank, and the 1000 Genomes Project. It underpins translational efforts at centers including Francis Crick Institute and Wellcome Sanger Institute and is cited in guidelines from organizations such as the European Society of Human Genetics.

The Archive interoperates with resources including the European Nucleotide Archive, dbGaP, ClinVar, ArrayExpress, Ensembl, and federated networks promoted by the Global Alliance for Genomics and Health and ELIXIR. It aligns metadata standards with initiatives such as the BioSamples Database and ontologies like the Human Phenotype Ontology to facilitate cross-repository queries alongside platforms like GA4GH Data Repository Service and cloud frameworks including Dockstore and CWL.

Category:Biological databases Category:Genomics