KEGG — LLMpedia

KEGG
Name	KEGG
Developer	Kanehisa Laboratories
Released	1995
Latest release	ongoing
Genre	Bioinformatics database

Contents

Overview
Databases and Components
Data Sources and Curation
Tools and Applications
Access and Licensing
Impact and Usage in Research

KEGG KEGG is a comprehensive bioinformatics resource integrating genomic, chemical, and systems information for interpreting high-level functions of biological systems. It connects sequence data from projects such as the Human Genome Project and 1000 Genomes Project to pathway representations used in studies by institutions like the National Institutes of Health and industries including Pfizer and Genentech. The resource underpins analyses across contexts including functional genomics, metabolic engineering, and drug development in collaborations with organizations such as the European Bioinformatics Institute and Swiss Institute of Bioinformatics.

Overview

KEGG provides structured representations of molecular networks and catalogs of genes, proteins, biochemical reactions, and small molecules used by researchers at entities like Harvard University, Stanford University, Massachusetts Institute of Technology, University of Tokyo, and Cold Spring Harbor Laboratory. Its pathway maps and reference databases enable cross-referencing among projects such as ENCODE Project, GTEx Project, Cancer Genome Atlas, International HapMap Project, and Human Microbiome Project. The platform supports comparative analyses impacting work from pharmaceutical companies including Novartis and AstraZeneca to academic centers such as Max Planck Society and National Institutes of Health institutes.

Databases and Components

The resource consists of multiple interlinked databases: a gene catalog used in studies at Sanger Institute and Broad Institute, a pathway map collection employed by groups like European Molecular Biology Laboratory, a chemical compound repository relevant to Merck Research Laboratories and Eli Lilly and Company, and enzyme and reaction datasets useful to laboratories at California Institute of Technology and Johns Hopkins University. Key components include pathway modules referenced in publications by researchers at University of California, Berkeley, metabolic reaction networks used by the Tokyo Institute of Technology, and orthology assignments comparable to resources from UniProt Consortium, Gene Ontology Consortium, and NCBI. The integration supports cross-links to databases maintained by Wikimedia Foundation projects and large-scale initiatives such as PubChem and ChEMBL.

Data Sources and Curation

Primary data inputs come from genome sequencing centers including Broad Institute and Wellcome Sanger Institute, chemical suppliers catalogued by Chemical Abstracts Service, and literature curated from journals published by houses like Nature Publishing Group, Science/AAAS, and Cell Press. Manual curation is carried out by teams associated with academic institutions such as University of Tokyo and collaborations with international groups including European Bioinformatics Institute curators. Automated mapping pipelines incorporate sequence annotations from databases like RefSeq and orthology predictions aligned with projects such as Ensembl and UniProtKB while maintaining traceability for provenance in workflows used by consortia like Global Alliance for Genomics and Health.

Tools and Applications

The platform provides analytical tools for pathway mapping, enrichment analysis, network visualization, and metabolic reconstruction used by researchers at MIT, ETH Zurich, University of Cambridge, and industry labs at Roche. Implementations support integration into pipelines leveraging software from groups such as Bioconductor, Cytoscape Consortium, and Galaxy Project. Applications span interpreting transcriptomic datasets from projects like ENCODE Project and Roadmap Epigenomics Project, guiding synthetic biology efforts exemplified by work at SynBioBeta-affiliated teams, and informing pharmacogenomics studies undertaken at Stanford University School of Medicine and Dana–Farber Cancer Institute.

Access and Licensing

Access policies balance open academic use similar to resources from European Bioinformatics Institute and subscription models employed by commercial providers like Elsevier and Thomson Reuters. Academic users at institutions such as University of California system and University College London often access public components, while commercial licensing aligns with practices adopted by companies such as Pfizer and Johnson & Johnson. The distribution model interacts with data standards advocated by World Wide Web Consortium and licensing frameworks referenced by organizations like Creative Commons and Open Data Institute.

Impact and Usage in Research

The resource has been cited across literature from laboratories at Harvard Medical School, Yale School of Medicine, Imperial College London, and Peking University for pathway-centric interpretation of high-throughput data. It has informed drug target identification in studies associated with GlaxoSmithKline and metabolic engineering projects at ETH Zurich and MIT spinouts. Citation networks show integration with work from major projects including Human Genome Project, Cancer Genome Atlas, and Human Microbiome Project, and it continues to be incorporated into bioinformatics curricula at universities such as University of California, San Diego and Kyoto University.

Category:Biological databases