Gene Expression Omnibus

Gene Expression Omnibus
Name	Gene Expression Omnibus
Producer	National Center for Biotechnology Information
Country	United States
Discipline	Molecular biology
Depth	Gene expression, functional genomics
Temporal	2000–present

Contents

Overview
History and Development
Data Content and Organization
Submission and Accession Process
Data Access, Tools, and Querying
Standards, Formats, and Metadata
Impact and Applications

Gene Expression Omnibus The Gene Expression Omnibus is a public repository for high-throughput functional genomics data hosted by a major biomedical data center. It aggregates experimental datasets from microarray, next-generation sequencing, and array-based studies submitted by research groups affiliated with universities, biotech companies, and governmental agencies. The resource supports data reuse by linking datasets to publications, grants, and institutional resources.

Overview

The repository serves as a centralized archive connecting submitters at institutions such as Harvard University, Stanford University, Massachusetts Institute of Technology, Cambridge University, Johns Hopkins University with curators at National Institutes of Health, National Library of Medicine, and consortia like the ENCODE Project, 1000 Genomes Project, The Cancer Genome Atlas. It indexes experiments, samples, platforms, and series records that are cited by journals like Nature, Science, Cell, The Lancet, facilitating reproducibility for studies from laboratories led by investigators supported by agencies such as National Science Foundation, Wellcome Trust, Howard Hughes Medical Institute.

History and Development

The archive emerged from initiatives at federal research organizations and collaborations involving institutions such as Cold Spring Harbor Laboratory, European Bioinformatics Institute, Broad Institute, and companies including Illumina, Affymetrix, Agilent Technologies. Early growth paralleled milestone projects like Human Genome Project, ENCODE Project, and the rise of platforms promoted at forums such as American Society of Human Genetics meetings and conferences hosted by Gordon Research Conferences or workshops at Cold Spring Harbor Laboratory. Governance and policy evolved alongside recommendations from editorial boards of journals such as Nature Genetics and funders like National Institutes of Health and Wellcome Trust.

Data Content and Organization

Content types include raw and processed transcriptomic data generated with technologies from vendors like Affymetrix, Illumina, Thermo Fisher Scientific and protocols standardized in resources from World Health Organization laboratory networks. Records are organized into entities that map to identifiers used in citation by projects such as GTEx Project, 1000 Genomes Project, International HapMap Project, and clinical cohorts funded by National Cancer Institute or coordinated by institutions like Mayo Clinic and Cleveland Clinic. Metadata links samples to biobanks, consortia, and studies involving collaborations with partners such as European Molecular Biology Laboratory and Max Planck Society.

Submission and Accession Process

Submitters from universities and companies file submissions following policies from funders like National Institutes of Health, Wellcome Trust, and journal requirements from publishers including Cell Press, Springer Nature, Oxford University Press. Each submission receives accession identifiers that are used in publications in Nature, Science, PNAS, and data citations tracked by indexing services such as PubMed, CrossRef, and repositories like Zenodo. Curatorial oversight involves staff trained via collaborations with organizations such as International Society for Computational Biology and standards bodies like ISO working groups.

Data Access, Tools, and Querying

Users query the archive using web interfaces and programmatic APIs developed by a major national bioinformatics center and integrated with tools from Bioconductor, Galaxy Project, UCSC Genome Browser, and visualization platforms used by Broad Institute researchers. Cross-references connect records to literature indexed in PubMed Central, variant resources like dbSNP, and pathway databases including KEGG and Reactome. Community portals and portals run by institutions such as European Bioinformatics Institute and Ensembl facilitate federated search across archives used by researchers at Imperial College London and University of California, San Francisco.

Standards, Formats, and Metadata

Data formats align with community standards promoted by organizations such as FAIRsharing, MIAME guidelines endorsed by journals and societies like EMBL-EBI and the International Society for Computational Biology, and file formats supported by vendors like Illumina and Affymetrix. Metadata vocabularies map to ontologies maintained by groups such as Gene Ontology Consortium, NCBI Taxonomy, and terminologies used in clinical collaborations with World Health Organization and regulatory frameworks referenced by agencies including Food and Drug Administration.

Impact and Applications

The archive underpins secondary analyses in cancer projects associated with The Cancer Genome Atlas and translational studies at centers like Dana-Farber Cancer Institute and Memorial Sloan Kettering Cancer Center, meta-analyses produced by teams at Sanger Institute and Broad Institute, and methods development cited in journals from IEEE and ACM conferences. It enables reproducible research used in drug discovery pipelines at companies like Roche, Pfizer, Novartis, and public health genomics studies involving collaborations with Centers for Disease Control and Prevention and international partners such as European Centre for Disease Prevention and Control.

Category:Biological databases