BioSamples database

BioSamples database
Name	BioSamples
Title	BioSamples database
Maintained by	European Bioinformatics Institute
Launched	2007
Domain	biomedical samples metadata

Contents

Overview
History and Development
Database Content and Structure
Data Submission and Curation
Access and Tools
Integration and Interoperability
Applications and Use Cases

BioSamples database is a centralized repository for sample metadata hosted by the European Bioinformatics Institute. It provides standardized descriptions of biological specimens linked to molecular data from repositories such as ArrayExpress, European Nucleotide Archive, and European Genome-phenome Archive. The resource supports interoperability with international initiatives including the Global Alliance for Genomics and Health and the Human Cell Atlas.

Overview

The resource bridges projects and institutions including the European Bioinformatics Institute, European Molecular Biology Laboratory, National Center for Biotechnology Information, Wellcome Sanger Institute, and Institut Pasteur with initiatives such as the Global Alliance for Genomics and Health, Human Cell Atlas, ENCODE, and 1000 Genomes. It catalogs specimen metadata used by repositories like ArrayExpress, European Nucleotide Archive, European Genome-phenome Archive, Gene Expression Omnibus, and MetaboLights while integrating ontologies from the Ontology Lookup Service, EMBL-EBI Ontology Portal, and OBO Foundry. Stakeholders include funders and agencies such as the Wellcome Trust, European Commission, National Institutes of Health, Human Frontier Science Program, and Medical Research Council.

History and Development

Initially developed at the European Bioinformatics Institute and funded by the European Commission and Wellcome Trust, the resource evolved alongside projects such as ENCODE, 1000 Genomes, Human Microbiome Project, Human Cell Atlas, and International Cancer Genome Consortium. Collaborations included partners like National Center for Biotechnology Information, Broad Institute, Wellcome Sanger Institute, EMBL-EBI, and Genome Canada. Milestones involved integration with ArrayExpress, European Nucleotide Archive, European Genome-phenome Archive, and adoption of standards from the Global Alliance for Genomics and Health and the OBO Foundry. Influential meetings and workshops were held with participants from Cold Spring Harbor Laboratory, Broad Institute, Wellcome Trust Sanger Institute, and National Human Genome Research Institute.

Database Content and Structure

Metadata records reference projects, consortia, and institutions such as ENCODE, Human Cell Atlas, International Cancer Genome Consortium, 1000 Genomes, Human Microbiome Project, and GTEx. The schema incorporates terms drawn from ontologies maintained by the OBO Foundry, Ontology Lookup Service, and the Gene Ontology Consortium, aligning with models used by the European Nucleotide Archive, ArrayExpress, and MetaboLights. Records contain links to datasets in Gene Expression Omnibus, European Genome-phenome Archive, Sequence Read Archive, and UniProt, and are cross-referenced with identifiers from ClinVar, dbGaP, COSMIC, and RefSeq. Structure supports submission objects referencing institutions like Wellcome Sanger Institute, Broad Institute, EMBL-EBI, and National Center for Biotechnology Information.

Data Submission and Curation

Submission pipelines accommodate contributors from institutes such as EMBL-EBI, Wellcome Sanger Institute, Broad Institute, National Center for Biotechnology Information, and Institut Pasteur. Curation follows guidelines influenced by the Global Alliance for Genomics and Health, FAIR principles championed by the European Commission and Research Council, and metadata standards used by ENCODE, Human Cell Atlas, and International Cancer Genome Consortium. Curators coordinate with ontology groups including the Gene Ontology Consortium, OBO Foundry, and Ontology Lookup Service, and engage with repositories like ArrayExpress, European Nucleotide Archive, and European Genome-phenome Archive to ensure consistency.

Access and Tools

Users access records through web interfaces and programmatic APIs developed at the European Bioinformatics Institute, with tooling interoperable with resources such as ArrayExpress, European Nucleotide Archive, European Genome-phenome Archive, and Gene Expression Omnibus. Programmatic access supports workflows integrating tools from Broad Institute, Wellcome Sanger Institute, EMBL-EBI, and National Center for Biotechnology Information. Visualization and analysis pipelines commonly connect with Galaxy, Nextflow, Snakemake, and Bioconductor, and link metadata to datasets hosted by UniProt, RefSeq, COSMIC, and ClinVar.

Integration and Interoperability

Interoperability efforts align the repository with international frameworks from the Global Alliance for Genomics and Health, OBO Foundry, Human Cell Atlas, ENCODE, and FAIR initiatives promoted by the European Commission and Research Councils. Cross-references facilitate connections to ArrayExpress, European Nucleotide Archive, European Genome-phenome Archive, Gene Expression Omnibus, Sequence Read Archive, and MetaboLights, enabling integration with platforms at Broad Institute, Wellcome Sanger Institute, National Center for Biotechnology Information, and EMBL-EBI. Standards and identifiers interlink with UniProt, RefSeq, dbGaP, ClinVar, COSMIC, and GTEx.

Applications and Use Cases

Researchers from institutions such as EMBL-EBI, Broad Institute, Wellcome Sanger Institute, Cold Spring Harbor Laboratory, and National Human Genome Research Institute use the repository to track provenance for studies in cancer genomics, human genetics, microbiome research, single-cell atlasing, and transcriptomics. Consortia including ENCODE, Human Cell Atlas, 1000 Genomes, International Cancer Genome Consortium, and Human Microbiome Project rely on the metadata service to harmonize sample descriptors for integration with datasets in ArrayExpress, European Nucleotide Archive, Gene Expression Omnibus, and European Genome-phenome Archive. Applied use cases span clinical variant interpretation with ClinVar and COSMIC, expression analyses with GTEx and GEO, and multi-omics integration with MetaboLights and UniProt.

Category:Biological databases