Chemical Entities of Biological Interest

Chemical Entities of Biological Interest
Name	ChEBI
Title	Chemical Entities of Biological Interest
Producer	European Bioinformatics Institute
Country	United Kingdom
Cost	Free
Disciplines	Bioinformatics; Chemistry; Molecular Biology
Bib	Smith et al. (2005)

Contents

Overview
Database Content and Structure
Data Curation and Governance
Access and Integration
Applications and Use Cases
History and Development

Chemical Entities of Biological Interest is a freely available, curated chemical ontology and database that catalogs small molecular entities relevant to biochemistry, molecular biology, and pharmacology. It provides structured identifiers, ontological classifications, chemical structures, and cross-references that support data integration across resources such as UniProt, Ensembl, PubChem, and DrugBank. The resource underpins computational workflows used by projects associated with the European Bioinformatics Institute, the European Molecular Biology Laboratory, and international initiatives including the Human Genome Project and the Human Proteome Project.

Overview

The resource assigns unique stable identifiers and hierarchical ontology terms to molecular entities including metabolites, drugs, cofactors, and natural products, linking to chemical structure representations and semantic relationships used by researchers at institutions such as Wellcome Trust Sanger Institute, Broad Institute, and National Institutes of Health. It interoperates with databases like KEGG, Reactome, and MetaCyc to enable pathway mapping and with standards organizations including the World Health Organization and the International Union of Pure and Applied Chemistry for nomenclature alignment. The database supports automated reasoning via relationships such as "is a" and "has role", facilitating usage in computational platforms developed by teams at Google DeepMind, IBM Research, and academic groups at Massachusetts Institute of Technology and Stanford University.

Database Content and Structure

Entries combine curated metadata, structural depictions (SMILES, InChI), and ontology annotations that reference experimental reagents, enzymatic cofactors, and approved therapeutics recognized by regulatory bodies like the European Medicines Agency and the U.S. Food and Drug Administration. Schema elements map to controlled vocabularies from organizations such as the Open Biological and Biomedical Ontology Foundry and link to identifiers in external resources including ChEMBL, PDB, and Gene Ontology. The hierarchical ontology organizes entities by chemical class, biological role, and structural features, enabling queries that integrate with pathway databases maintained by groups behind Reactome and BioCyc and with sequence resources curated at UniProt Consortium and GenBank.

Data Curation and Governance

Curation follows best practice pipelines influenced by standards from bodies like the International Council for Science and practices used at centers including the European Molecular Biology Laboratory and the National Center for Biotechnology Information. Content is reviewed by expert curators and community contributors from universities such as University of Cambridge, Harvard University, and University of Oxford, and by consortia members linked to projects like the Proteomics Standards Initiative. Governance includes versioning, provenance tracking, and licensing compatible with open science initiatives championed by funders such as the Wellcome Trust and the European Commission. Quality control integrates automated validation routines alongside manual review processes used in knowledgebases like Ensembl and UniProt.

Access and Integration

The resource is accessible via web interfaces, downloadable flat files, SPARQL endpoints, and RESTful APIs adopted by platforms at EMBL-EBI and by commercial partners including Elsevier and Clarivate. Cross-references enable integration into workflows run on cloud providers such as Amazon Web Services, Google Cloud Platform, and Microsoft Azure, and into analysis pipelines implemented with tools from the Bioconductor project and the Galaxy Project. Semantic web compatibility allows linkage with triplestores operated by institutions like the European Bioinformatics Institute and the University of Oxford for federated queries across datasets from PDB-Dev, ArrayExpress, and MGnify.

Applications and Use Cases

Researchers use the database for metabolomics annotation in studies led by groups at Max Planck Society and Lawrence Berkeley National Laboratory, for drug repurposing analyses linked to projects at NIH NCATS and Imperial College London, and for systems biology modeling by teams at Center for Systems Biology and European Molecular Biology Laboratory. Clinical informatics projects at hospitals such as Mayo Clinic and Johns Hopkins Hospital integrate identifiers to reconcile laboratory chemistry results with electronic health record data mapped to standards like those from the International Classification of Diseases and the SNOMED International consortium. Educational initiatives leverage the ontology in curricula at institutions including University College London and California Institute of Technology.

History and Development

Origins trace to efforts within the European Bioinformatics Institute and collaborations with academic groups and funding agencies including the Wellcome Trust and the European Commission. Over time the resource expanded through community submissions, integration with projects like PubChem and ChEMBL, and adoption by consortia such as the OpenPHACTS project and the ELIXIR infrastructure. Milestones include extension of the ontology to encompass complex ions and polymers, adoption of semantic web standards promoted at conferences like ISMB and Bio-ontologies Workshop, and sustained funding partnerships involving organizations such as the European Research Council.

Category:Bioinformatics databases