SMART (database) — LLMpedia

SMART (database)
Name	SMART
Title	SMART (database)
Developed by	Health and biomedical research consortia
Initial release	1990s
Type	Biological sequence and domain resource
Access	Public web portal
License	Open access

Contents

Introduction
History and Development
Data Content and Curation
Access and Tools
Applications and Impact
Criticisms and Limitations

SMART (database)

SMART is a bioinformatics resource that catalogs protein domains, signaling modules, and annotations across taxa. It integrates curated domain models with sequence alignments and phylogenetic profiles to support research in molecular biology, genomics, and proteomics. The resource is widely used in conjunction with other databases and tools for protein function prediction, comparative genomics, and pathway analysis.

Introduction

SMART originated as a specialized repository for identifying and annotating protein domains, motifs, and signaling modules found in proteins from organisms studied in projects like Human Genome Project, Mouse Genome Project, Arabidopsis thaliana genome, Saccharomyces cerevisiae genome, and microbial sequencing initiatives associated with Genome Project consortia. It provides domain architectures, multiple sequence alignments, and graphical displays that interoperate with resources such as UniProt, Protein Data Bank, Pfam, InterPro, NCBI, and Ensembl. Researchers from institutions including European Molecular Biology Laboratory, Stanford University, Massachusetts Institute of Technology, Max Planck Society, and Cold Spring Harbor Laboratory have cited SMART in studies spanning cell signaling, developmental biology, and disease genetics.

History and Development

The database traces development to research groups focused on signaling and modular domain evolution during the 1990s, contemporaneous with projects at European Bioinformatics Institute and collaborations with labs at University of Cambridge, Harvard University, University of California, Berkeley, and University of Kyoto. Early milestones include incorporation of curated models for SH2 and SH3 domains studied in work from Howard Hughes Medical Institute-affiliated labs and comparative analyses referencing datasets from International HapMap Project and microbial genome initiatives coordinated by Joint Genome Institute. Subsequent releases expanded taxonomic coverage to include sequences from projects by Sanger Institute, Wellcome Trust, National Institutes of Health, National Human Genome Research Institute, and national sequencing centers in Japan, China, and Germany. Integration efforts linked SMART with ontology and standards efforts such as Gene Ontology, Sequence Ontology, and interactions with pathway databases like KEGG, Reactome, and BioCyc.

Data Content and Curation

SMART curators maintain domain models represented as profile hidden Markov models and multiple sequence alignments derived from experimentally characterized families including domains studied in landmark papers from labs at Johns Hopkins University, Columbia University, Yale University, University of Oxford, and Uppsala University. The dataset incorporates sequences annotated in RefSeq, cross-references to entries in UniProtKB/Swiss-Prot, and structural mappings to entries in Protein Data Bank via collaborations with structural biology groups at European Synchrotron Radiation Facility and cryo-EM teams linked to Max Planck Institute for Biophysical Chemistry. Curation practices reference standardized nomenclature used by committees such as Human Genome Organisation and leverage comparative datasets produced by consortia like 1000 Genomes Project and ENCODE Project to improve domain boundary annotation and functional inference.

Access and Tools

Users access the resource via a web portal maintained by teams associated with institutions like Heidelberg University, Karolinska Institutet, and University of Tübingen, with programmatic access through APIs modeled on services from EBI Web Services and interoperable with pipelines using Galaxy Project, Bioconductor, HMMER, and BLAST. Visualization and annotation tools draw inspiration from interfaces developed at Wellcome Sanger Institute and employ formats compatible with standards from Global Alliance for Genomics and Health and MIAME-style metadata practices. Educational and outreach collaborations have involved efforts with European Molecular Biology Organization, American Society for Cell Biology, Federation of European Biochemical Societies, and regional bioinformatics networks in Brazil, India, and South Africa.

Applications and Impact

SMART has been applied in studies of signaling networks in research groups at Massachusetts General Hospital, Broad Institute, Scripps Research, Dana-Farber Cancer Institute, and clinical genomics centers such as Mayo Clinic and Cleveland Clinic. It supports annotation pipelines in comparative genomics projects led by Broad Institute and evolutionary studies published by teams at Princeton University, University of Chicago, and University of California, San Diego. The database underpins domain-centric analyses in cancer genomics consortia like The Cancer Genome Atlas and functional annotation in microbial ecology efforts tied to Human Microbiome Project. SMART-derived annotations have informed work recognized by awards and collaborations involving institutions such as Nobel Prize-winning laboratories and translational programs at National Cancer Institute.

Criticisms and Limitations

Critiques from researchers at University of California, San Francisco, Imperial College London, University of Edinburgh, and independent bioinformatics groups highlight challenges including coverage gaps relative to resources like Pfam and InterPro, inconsistencies in domain boundary definitions compared with structural databases such as Protein Data Bank, and the need for more frequent updates to match rapid sequencing outputs from consortia like 100,000 Genomes Project and national initiatives by NIH. Limitations in automated annotation pipelines have prompted recommendations from standards bodies including Gene Ontology Consortium and calls for tighter integration with community curation models used by Wikidata and collaborative platforms promoted by Open Bioinformatics Foundation.

Category:Biological databases