ChEBI — LLMpedia

ChEBI
Name	ChEBI
Scope	small molecules and ions
Maintained by	European Bioinformatics Institute
Launched	2004
Access	free

Contents

Overview
Data Model and Content
Curation and Ontology Structure
Access and Tools
Integration and Applications
History and Governance

ChEBI is a freely available, curated database and ontology of small molecular entities focused on chemical compounds of biological interest. It is produced by the European Bioinformatics Institute and serves researchers in molecular biology, pharmacology, metabolomics, and cheminformatics by providing standardized identifiers, structures, and semantic relationships. The resource links chemical information to biological databases and ontologies to support data integration, annotation, and computational analysis.

Overview

ChEBI provides structured descriptions of small molecules, including stereochemistry, ionization, and roles, and connects entries to external resources maintained by organizations such as the European Bioinformatics Institute, the Protein Data Bank in Europe, the Human Protein Atlas, and the UniProt Consortium. The database supports interoperability with community standards from the International Union of Pure and Applied Chemistry, the World Health Organization, the National Institutes of Health, and the World Wide Web Consortium, promoting reuse across projects like the Gene Ontology, Reactome, KEGG, and DrugBank.

Data Model and Content

The ChEBI data model represents chemical entities with canonical names, synonyms, molecular formulas, InChI strings, SMILES, and two-dimensional structures, linking each entity to metadata curated by teams at the European Bioinformatics Institute and contributors from academic groups such as the European Molecular Biology Laboratory, the Wellcome Sanger Institute, and the Max Planck Society. Entries are annotated with roles and biological activities that reference controlled vocabularies used by the Human Genome Organisation, the International Union of Biochemistry and Molecular Biology, and the Medical Research Council. Cross-references connect records to external identifiers from PubChem, ChemSpider, the ZINC database, ChEMBL, the Protein Data Bank, and the Comparative Toxicogenomics Database to facilitate cheminformatics workflows in research at institutions like Harvard University, Massachusetts Institute of Technology, Stanford University, and the University of Cambridge.

Curation and Ontology Structure

Curation in ChEBI is performed by expert curators who apply evidence from primary literature published in journals such as Nature, Science, The Journal of Chemical Education, The Lancet, and The Journal of Biological Chemistry, and coordinate with standards bodies like IUPAC and the Open Biomedical Ontologies community. The ontology structure distinguishes between molecular entities, classes, and roles, enabling integration with ontologies developed by the Gene Ontology Consortium, the Open Biomedical Ontologies Foundry, and the National Center for Biomedical Ontology. Semantic relations used in the ontology are harmonized with practices from resources like SNOMED CT, MeSH, and the International Classification of Diseases to support interoperability with clinical and translational projects at organizations such as the National Institutes of Health, Centers for Disease Control and Prevention, and World Health Organization.

Access and Tools

Users access ChEBI via a web interface hosted at the European Bioinformatics Institute, programmatic RESTful APIs, and downloadable files compatible with tools developed by the Open Source community and commercial vendors including the R Project for Statistical Computing, Python libraries maintained by contributors at Google and Microsoft Research, and cheminformatics toolkits such as RDKit and Open Babel. Visualization and analysis integrate with platforms like Cytoscape, KNIME, Galaxy, and Bioconductor to support studies at research centers including the Broad Institute, EMBL-EBI, and the European Molecular Biology Laboratory. Training and outreach have been conducted in collaboration with universities like University College London, the University of Oxford, and ETH Zurich.

Integration and Applications

ChEBI is integrated into workflows for metabolomics, drug discovery, toxicology, and systems biology used by consortia including the Human Genome Project legacy groups, the Human Cell Atlas, the Cancer Genome Atlas, and the Human Proteome Organization. It underpins annotation in pathway resources such as Reactome, MetaCyc, and WikiPathways, and supports data exchange with commercial and public resources including PubChem, DrugBank, ChEMBL, ZINC, and HMDB. Applied projects that leverage ChEBI identifiers include clinical informatics efforts at NHS Digital, translational research at the National Cancer Institute, environmental health studies coordinated by the Environmental Protection Agency, and industrial R&D at pharmaceutical companies like GlaxoSmithKline, Pfizer, Novartis, and Roche.

History and Governance

ChEBI began as a project within the European Bioinformatics Institute in the early 2000s and evolved through collaborations with academic groups and funding agencies across Europe and North America such as the Wellcome Trust, the European Molecular Biology Laboratory, and the Biotechnology and Biological Sciences Research Council. Governance combines institutional stewardship at EMBL-EBI with community-driven contributions from researchers affiliated with universities and organizations including the University of Cambridge, Imperial College London, Cold Spring Harbor Laboratory, and the Max Planck Society. Development milestones have been reported in peer-reviewed venues and presented at meetings convened by organizations like the International Society for Computational Biology, the American Chemical Society, and the Royal Society of Chemistry.

Category:Chemical databases