ArrayExpress — LLMpedia

ArrayExpress
Title	ArrayExpress
Producer	European Molecular Biology Laboratory
Country	United Kingdom
Languages	English
Cost	Free
Formats	Microarray, RNA-seq, high-throughput sequencing

Contents

Overview
History and Development
Data Content and Scope
Submission and Curation Policies
Access, Retrieval and Tools
Integration with Other Databases

ArrayExpress

ArrayExpress is a public functional genomics data repository operated by the European Molecular Biology Laboratory's European Bioinformatics Institute. It archives high-throughput transcriptomics and microarray experiments submitted by researchers from institutions such as the Wellcome Trust Sanger Institute, Harvard University, Massachusetts Institute of Technology, Stanford University, and the Max Planck Society. The resource supports data reuse by linking experiments to resources at the National Center for Biotechnology Information, the Gene Expression Omnibus, the European Nucleotide Archive, and consortia like the ENCODE Project and the 1000 Genomes Project.

Overview

ArrayExpress provides curated experimental metadata and raw data for transcriptomics studies generated by platforms including Affymetrix, Illumina, Agilent, and Oxford Nanopore. It serves communities anchored at organizations such as the European Bioinformatics Institute, the European Molecular Biology Laboratory, the National Institutes of Health, the Wellcome Trust, and the Human Cell Atlas initiative. The repository interoperates with bioinformatics tools and frameworks developed at institutions like EMBL-EBI, the Broad Institute, the Wellcome Sanger Institute, the Francis Crick Institute, and the European Genome-phenome Archive. Major scientific projects that rely on archived datasets include ENCODE, GTEx, ENA-linked sequencing efforts, and consortia funded by the European Commission and the National Science Foundation.

History and Development

ArrayExpress was established amid growth in microarray technology pioneered by companies and groups such as Affymetrix, Cold Spring Harbor Laboratory, Stanford University, the Broad Institute, and the Sanger Centre. Early development involved collaborations among the European Bioinformatics Institute, the European Molecular Biology Laboratory, the Wellcome Trust, and national research councils including the UK Research and Innovation and the US National Institutes of Health. Over time the repository has integrated standards from the Microarray Gene Expression Data Society, contributors from the Human Genome Project, and sequencing data management practices from the International Nucleotide Sequence Database Collaboration partners EMBL-EBI, GenBank at the National Center for Biotechnology Information, and the DNA Data Bank of Japan. Influential researchers and institutions associated with its evolution include Michael Ashburner, Ewan Birney, Richard Durbin, and teams at the Sanger Institute, Broad Institute, and EMBL.

Data Content and Scope

ArrayExpress archives experiments covering organisms studied by the Human Genome Project, the Mouse Genome Sequencing Consortium, Arabidopsis Genome Initiative, Saccharomyces Genome Database contributors, and other community projects. Data types include microarray CEL files from Affymetrix, raw sequencing reads submitted to the European Nucleotide Archive, processed expression matrices used by groups at Harvard Medical School, Stanford Medicine, and Yale School of Medicine, and single-cell datasets aligned with Human Cell Atlas workflows led by the Broad Institute, Wellcome Sanger Institute, and the Allen Institute for Brain Science. The repository hosts studies relevant to disease-focused centers such as the National Cancer Institute, Cancer Research UK, the Institute of Cancer Research, and projects tied to the Psychiatric Genomics Consortium and the International Cancer Genome Consortium. Datasets span model organisms investigated at Cold Spring Harbor Laboratory, Max Planck Institute, EMBL, and the Japanese RIKEN institute.

Submission and Curation Policies

Submitters include laboratory groups at universities such as University of Cambridge, University of Oxford, Massachusetts Institute of Technology, and clinical consortia affiliated with Harvard Medical School, Johns Hopkins University, and Karolinska Institutet. Metadata standards draw upon guidelines from the Minimum Information About a Microarray Experiment initiative, collaborations with the Functional Genomics Data Society, and harmonization efforts involving the Global Alliance for Genomics and Health. Curation workflows are informed by practices used at the European Genome-phenome Archive, GenBank, and the Sequence Read Archive, and by quality-control methods developed at the Broad Institute and EMBL-EBI. Policies reflect funding-agency requirements from the Wellcome Trust, the European Research Council, the National Institutes of Health, and national funders in Germany, France, and Japan regarding data sharing and embargo periods.

Access, Retrieval and Tools

Users retrieve data via the ArrayExpress web interface and programmatic access points compatible with tools from the Bioconductor project, Galaxy platform, Ensembl resources, UCSC Genome Browser integrations, and Cytoscape applications. Search functionality leverages ontologies developed with contributions from the Open Biomedical Ontologies community and terminologies used by the Human Phenotype Ontology consortium, enabling queries by project identifiers, investigator names at institutions such as EMBL-EBI, Broad Institute, and Stanford University, and by experimental factors cited by the International Society for Computational Biology. Analysis pipelines commonly combine ArrayExpress data with software from Bioconductor packages maintained by communities at RStudio, the Bioconductor project, and academic groups at Johns Hopkins University and Imperial College London.

Integration with Other Databases

ArrayExpress maintains bidirectional links and synchronization relationships with the European Nucleotide Archive, the Gene Expression Omnibus at the National Center for Biotechnology Information, Ensembl, UniProt, Reactome, and resources developed by the Protein Data Bank, the Human Protein Atlas, and the Mouse Genome Informatics group. Cross-references enable combined queries across data aggregators such as the Catalogue Of Somatic Mutations In Cancer hosted by the Wellcome Sanger Institute, the Cancer Genome Atlas managed by the National Cancer Institute, and pathway resources curated by the Kyoto Encyclopedia of Genes and Genomes. Collaborative interoperability projects include work with the Global Alliance for Genomics and Health, the ELIXIR infrastructure, the BioSamples database at EMBL-EBI, and metadata harmonization efforts involving the FAIR principles advocates and the Research Data Alliance.

Category:Biological databases