TIGRFAMs — LLMpedia

TIGRFAMs
Name	TIGRFAMs
Type	Database
Scope	Protein families
Owner	J. Craig Venter Institute
Released	2001

Contents

Overview
History and development
Construction and classification of models
Applications and usage
Integration with bioinformatics resources
Limitations and challenges

TIGRFAMs is a curated collection of protein family models designed for the classification and functional annotation of microbial proteins. The resource provides Hidden Markov Models (HMMs) and associated semantic information to support genome annotation, comparative genomics, and metagenomics projects. It is developed and maintained by a research group at the J. Craig Venter Institute and has been used alongside other community resources to improve automated annotation pipelines.

Overview

TIGRFAMs offers profile HMMs that capture conserved sequence features for protein families, enabling sensitive detection of homologs across bacterial, archaeal, and viral genomes. The resource complements databases such as Pfam, InterPro, KEGG, UniProt, RefSeq by providing manually curated models with specific functional assignments and cutoffs. Model annotations are used by projects at institutions like National Institutes of Health, European Molecular Biology Laboratory, Broad Institute, Sanger Institute and by consortia including the Human Microbiome Project and Genome 10K. TIGRFAMs models integrate evidence types from publications in journals such as Nature, Science, Nucleic Acids Research, Genome Research and leverage data from repositories like GenBank and Protein Data Bank.

History and development

The TIGRFAMs collection originated in the early 2000s within the Institute led by J. Craig Venter and collaborators from organizations such as The Institute for Genomic Research and The Scripps Research Institute. Early work built on profile HMM methodology developed by researchers at Washington University in St. Louis and tools like HMMER from the European Bioinformatics Institute. Over successive releases, TIGRFAMs incorporated curated assignments informed by comparative analyses used in landmark projects such as the Human Genome Project, TerraGenome Project, Global Ocean Sampling expedition and studies by teams at Lawrence Berkeley National Laboratory and Argonne National Laboratory. Funding and collaboration pathways involved agencies such as the National Science Foundation, National Institute of Allergy and Infectious Diseases and commercial partners including Celera Genomics.

Construction and classification of models

Models in the collection are built from multiple sequence alignments derived from experimentally characterized sequences or high-confidence computational clusters drawn from databases like UniProtKB/Swiss-Prot, RefSeq, and large-scale datasets generated by groups at European Molecular Biology Laboratory and Broad Institute. Alignment tools and phylogenetic methods from developers at University of California, Santa Cruz and University of Washington are used to construct HMMs with software such as HMMER and validation via benchmarking against curated sets from Swiss-Prot and community standards promoted by BioPerl and Open Bioinformatics Foundation. Classification schemes annotate models with functional categories that map to ontologies maintained by organizations including the Gene Ontology Consortium, Sequence Ontology and cross-references to pathway resources like MetaCyc, Reactome, and KEGG PATHWAY. Model thresholds (cutoffs) are determined using receiver operating characteristic analyses and expert curation practices practiced in groups at European Bioinformatics Institute and National Center for Biotechnology Information.

Applications and usage

TIGRFAMs models are applied in automated annotation pipelines used by genome centers such as Joint Genome Institute, Sanger Institute, Los Alamos National Laboratory and projects like Human Microbiome Project and Earth Microbiome Project. They assist in functional profiling in metagenomic studies executed by consortia including Global Ocean Sampling expedition and laboratories at Woods Hole Oceanographic Institution and Scripps Institution of Oceanography. Clinical microbiology groups at institutions such as Mayo Clinic, Johns Hopkins University, and Massachusetts General Hospital use models indirectly through annotation systems integrated with RefSeq and GenBank. TIGRFAMs also contribute to comparative genomics analyses in studies published by teams at Harvard University, Stanford University, MIT and underpin annotation in platforms like IMG/M, MicrobesOnline, and PATRIC.

Integration with bioinformatics resources

The resource interoperates with databases and tools maintained by organizations including UniProt, NCBI, EMBL-EBI, KEGG, and the Gene Ontology Consortium. Pipelines such as Prokka, RAST, Maker, and workflow systems developed at Open Bioinformatics Foundation incorporate TIGRFAMs HMMs for improved specificity in annotation. Cross-references map TIGRFAMs models to entries in Pfam, InterPro, COG database, and pathway resources like MetaCyc to facilitate multi-database queries used by researchers at European Molecular Biology Laboratory and Broad Institute. Integration into cloud platforms and workflows employed by Amazon Web Services research teams and grid infrastructures at XSEDE enables large-scale metagenomics analyses.

Limitations and challenges

Despite extensive curation, models can suffer from annotation transfer errors similar to issues faced by UniProtKB, RefSeq, and KEGG when underlying sequence databases contain misannotations from high-throughput projects at institutions such as GenBank contributors. Detecting distant homologs can be limited by sequence divergence and domain rearrangements documented in studies from Cold Spring Harbor Laboratory and Max Planck Institute. Maintenance challenges include resource allocation at institutes like J. Craig Venter Institute and dependency on funding from agencies such as National Institutes of Health and National Science Foundation. Finally, reconciling conflicting functional assignments across resources like InterPro, Pfam, and COG database requires community coordination involving participants from European Bioinformatics Institute, National Center for Biotechnology Information, and academic labs at University of Cambridge, University of Oxford, and University of California, Berkeley.

Category:Biological databases Category:Protein families