Entrez Gene — LLMpedia

Entrez Gene
Title	Entrez Gene
Producer	National Center for Biotechnology Information
Country	United States
Format	relational database, web interface, API
Launched	1990s

Contents

Overview
History and development
Database content and structure
Access and query methods
Integration with other NCBI resources
Impact and applications in research

Entrez Gene is a curated gene-specific database produced by the National Center for Biotechnology Information. It provided centralized, cross-referenced summaries for genes from diverse organisms, integrating sequence identifiers, nomenclature, functional annotation, and bibliographic links. The resource was tightly connected to other NCBI resources and to external repositories maintained by institutions such as the National Institutes of Health, European Bioinformatics Institute, Wellcome Trust Sanger Institute, and major model organism databases.

Overview

Entrez Gene served as a hub linking gene-centric information across databases such as GenBank, RefSeq, PubMed, Gene Expression Omnibus, and Protein Data Bank. Entries included standardized symbols from authorities like the HUGO Gene Nomenclature Committee and cross-references to resources at UniProt, Ensembl, FlyBase, WormBase, and Mouse Genome Informatics. The resource supported researchers working on organisms ranging from Homo sapiens to Saccharomyces cerevisiae and model systems used in laboratories at institutions including the Massachusetts Institute of Technology and Stanford University.

History and development

Development traces to computational initiatives at the National Center for Biotechnology Information during the 1990s under leadership connected to figures associated with the National Institutes of Health. Early efforts arose alongside projects such as GenBank and the Entrez retrieval system, reflecting collaborations with international partners like the European Bioinformatics Institute and the DNA Data Bank of Japan. Over time, contributions from communities tied to the Human Genome Project, the ENCODE Project and model organism consortia influenced schema changes, curation workflows, and export standards. Major milestones paralleled releases of RefSeq and integration with literature indexed by PubMed Central.

Database content and structure

Each record combined identifiers, nomenclature, genomic coordinates, and curated summaries linking to sequence and protein records in GenBank and RefSeq. The schema mapped gene-to-transcript relationships, evidence tags tied to projects like ENCODE Project and cross-links to variant repositories maintained by organizations such as ClinVar and clinical resources at the Food and Drug Administration. Data provenance included submissions from contributors at the Wellcome Trust Sanger Institute, community databases like ZFIN and Rat Genome Database, and annotation projects affiliated with the Broad Institute and the Max Planck Society.

Access and query methods

Users queried Entrez Gene via the NCBI web portal, programmatic interfaces such as Entrez Programming Utilities (E-utilities), and bulk FTP distribution consistent with practices at the National Center for Biotechnology Information. Integration with workflow tools leveraged standards used by projects at the European Bioinformatics Institute and bioinformatics platforms developed at the University of California, San Diego and Cold Spring Harbor Laboratory. Queries were often combined with literature searches in PubMed and sequence retrieval from GenBank in pipelines used by researchers at institutions like the Broad Institute and pharmaceutical companies based in regions such as Cambridge, Massachusetts.

Integration with other NCBI resources

Entrez Gene functioned as an index linking to NCBI assets: sequence accessions in GenBank, curated protein records in RefSeq Protein, expression datasets in Gene Expression Omnibus, structural entries in the Protein Data Bank, and literature in PubMed. It interfaced with taxonomic data maintained under international collaborations involving the International Union for Conservation of Nature and sample metadata practices aligned with initiatives at the Global Biodiversity Information Facility. Cross-resource links facilitated combined queries applied in studies associated with large consortia such as the 1000 Genomes Project and the Cancer Genome Atlas.

Impact and applications in research

Entrez Gene accelerated gene-centric discovery in genetics, genomics, and molecular biology by consolidating identifiers and annotations used in analyses at universities such as Harvard University, research centers including the Broad Institute, and industry laboratories at companies headquartered in San Francisco and Cambridge, United Kingdom. It underpinned gene set definitions for pathway analyses drawing on resources like KEGG and Reactome and supported translational efforts linked to clinical variant interpretation at ClinVar and regulatory submissions to the Food and Drug Administration. The resource influenced bioinformatics tool development at groups such as those at the European Molecular Biology Laboratory and training programs at institutions like the Cold Spring Harbor Laboratory.

Category:Biological databases