LLMpediaThe first transparent, open encyclopedia generated by LLMs

CDD (Conserved Domain Database)

Generated by GPT-5-mini
Note: This article was automatically generated by a large language model (LLM) from purely parametric knowledge (no retrieval). It may contain inaccuracies or hallucinations. This encyclopedia is part of a research project currently under review.
Article Genealogy
Parent: Pfam Hop 4
Expansion Funnel Raw 1 → Dedup 0 → NER 0 → Enqueued 0
1. Extracted1
2. After dedup0 (None)
3. After NER0 ()
4. Enqueued0 ()
CDD (Conserved Domain Database)
NameConserved Domain Database
AltCDD
Developed byNational Center for Biotechnology Information
TypeBiological database
DisciplineMolecular biology

CDD (Conserved Domain Database) The Conserved Domain Database is a curated collection of protein domain models maintained by the National Center for Biotechnology Information that supports annotation of protein sequences and discovery of evolutionary relationships. It integrates models from multiple sources and interoperates with resources for sequence analysis, structural biology, and genomics. The resource underpins workflows in projects affiliated with the National Institutes of Health, the National Library of Medicine, and collaborations involving university laboratories and international databases.

Overview

CDD provides curated protein domain models that represent evolutionarily conserved regions found in proteins cataloged by institutions such as the National Center for Biotechnology Information, the National Institutes of Health, the National Library of Medicine, and partner organizations. The database aggregates models related to entries in UniProt, RefSeq, and GenBank and links to structural information from the Protein Data Bank and functional annotations from resources like Gene Ontology and Ensembl. Its development has been influenced by initiatives from the Human Genome Project and collaborations with groups associated with the European Bioinformatics Institute, the Wellcome Trust, and various university research centers.

Database Content and Curation

Content is composed of domain models derived from curated alignments and profile resources originating with sources including Pfam, SMART, TIGRFAMs, and Conserved Domain Architecture Retrieval Tool collaborators. Curators annotate domain boundaries, conserved motifs, and functional sites by cross-referencing experimentally characterized proteins cataloged in the Protein Data Bank alongside sequence collections like UniProtKB, RefSeq, and GenBank. Curation protocols reference standards from funding and policy stakeholders such as the National Human Genome Research Institute, the Howard Hughes Medical Institute, and consortiums tied to the European Molecular Biology Laboratory and the Wellcome Sanger Institute. Quality control employs methods developed in computational biology circles associated with institutions like MIT, Stanford University, Harvard University, and the University of California system.

Data Access and Tools

Users access domain models and annotations through web interfaces and programmatic services provided by the National Center for Biotechnology Information. Tools integrate with BLAST implementations and sequence viewers used in pipelines developed at organizations like the Broad Institute, EMBL-EBI, and Cold Spring Harbor Laboratory. Visualization and interactive exploration link to structural viewers referencing the Protein Data Bank and tools popularized by research groups at Johns Hopkins University, Yale University, and the University of Cambridge. Programmatic access is compatible with workflows from projects led by institutions such as the European Bioinformatics Institute, the National Center for Biotechnology Information, and collaborations involving the Wellcome Trust Sanger Institute.

Applications and Use Cases

CDD supports annotation in genome projects undertaken by groups including the Human Genome Project, the 1000 Genomes Project, and large-scale sequencing centers like the Broad Institute and the Wellcome Sanger Institute. It is employed in studies on protein evolution from laboratories at Harvard Medical School, Stanford University School of Medicine, and the Max Planck Institute, and in structural interpretation efforts that draw on resources such as the Protein Data Bank and EMDB. Clinical and translational research uses CDD-derived annotations in pipelines developed at institutions like Mayo Clinic, Cleveland Clinic, and the National Institutes of Health Clinical Center, and in computational projects affiliated with IBM Research and Google DeepMind.

Integration with Other Resources

CDD integrates with major sequence and structural databases including UniProt, RefSeq, GenBank, and the Protein Data Bank, and interoperates with annotation frameworks like Gene Ontology and Ensembl. Collaborative links connect CDD outputs to domain model resources such as Pfam and SMART, and to analysis platforms utilized by the European Bioinformatics Institute, the Broad Institute, and national research infrastructures sponsored by the National Institutes of Health and the Wellcome Trust. Cross-resource coordination aligns with standards discussed at meetings and workshops organized by organizations like the Cold Spring Harbor Laboratory, the Gordon Research Conferences, and the International Society for Computational Biology.

Category:Biological databases