LLMpediaThe first transparent, open encyclopedia generated by LLMs

Swiss-Prot

Generated by GPT-5-mini
Note: This article was automatically generated by a large language model (LLM) from purely parametric knowledge (no retrieval). It may contain inaccuracies or hallucinations. This encyclopedia is part of a research project currently under review.
Article Genealogy
Expansion Funnel Raw 52 → Dedup 0 → NER 0 → Enqueued 0
1. Extracted52
2. After dedup0 (None)
3. After NER0 ()
4. Enqueued0 ()
Swiss-Prot
NameSwiss-Prot
TypeManually curated protein sequence database
CountrySwitzerland
Established1986
Maintained byUniProtKB/Swiss-Prot component of UniProt Consortium
FormatsFASTA, XML, RDF

Swiss-Prot Swiss-Prot is a manually curated protein sequence and annotation resource originating from Europe and maintained as part of the UniProt knowledgebase. It provides high-quality, non-redundant protein entries with expert annotations on function, structure, post-translational modifications, and variants, widely used by researchers in European Molecular Biology Laboratory, Swiss Institute of Bioinformatics, National Center for Biotechnology Information, European Bioinformatics Institute, and industry partners such as Genentech, Novartis, and Roche.

History

Swiss-Prot was founded in 1986 by a collaboration between the University of Geneva group led by Amos Bairoch and European partners including the Swiss Institute of Bioinformatics and the European Molecular Biology Laboratory. Early milestones include integration with sequence repositories like the Protein Data Bank, cooperation with the Human Genome Project era initiatives, and participation in multinational efforts such as the International Nucleotide Sequence Database Collaboration. Over decades it evolved through projects associated with the European Bioinformatics Institute, the National Institutes of Health, and became a core component of the UniProt Consortium alongside groups at the Swiss Institute of Bioinformatics and the European Molecular Biology Laboratory.

Content and Curation

Entries in the resource are curated by expert annotators from institutions such as the Swiss Institute of Bioinformatics, European Molecular Biology Laboratory, and collaborating centers at the National Center for Biotechnology Information and major research universities like University of Cambridge, Massachusetts Institute of Technology, and Harvard University. Curators integrate experimental evidence from literature published in journals like Nature, Science, Cell, The Lancet, and repositories including the Protein Data Bank, GenBank, and organism-specific resources such as FlyBase, WormBase, and Saccharomyces Genome Database. Each entry includes sequence provenance, functional description, catalytic activity cross-references to resources like Enzyme Commission classifications, and curated notes drawn from primary studies by investigators at institutions such as Max Planck Society, Cold Spring Harbor Laboratory, and Johns Hopkins University.

Data Model and Annotation Standards

The database uses a structured model aligning with community standards developed with partners like the Gene Ontology consortium and cross-references to ontologies maintained by groups such as the Open Biological and Biomedical Ontology Foundry and the World Health Organization nomenclature efforts. Annotations include controlled vocabularies for subcellular location referencing standards used by the Protein Data Bank, and variant descriptions consistent with nomenclature recommended by organizations like the Human Genome Variation Society. Evidence tagging follows conventions adopted in collaborations with the European Bioinformatics Institute and annotation exchange protocols shared with databases such as RefSeq and Ensembl.

Access and Tools

Public access is provided through the UniProt web portal developed jointly by the European Bioinformatics Institute, the Swiss Institute of Bioinformatics, and the Protein Information Resource. Programmatic retrieval supports RESTful APIs and bulk FTP downloads compatible with tools from projects such as Bioconductor, Galaxy, and Cytoscape. Visualization and sequence analysis integrate with software like BLAST, Clustal, PyMOL, and workflows from platforms maintained by the European Molecular Biology Laboratory and community resources at the National Center for Biotechnology Information.

As a component of UniProtKB, the curated entries are integrated alongside automated annotations contributed by groups such as TrEMBL pipelines and cross-linked to structural data in the Protein Data Bank, genomic coordinates in Ensembl and RefSeq, functional classifications in the Gene Ontology and metabolic maps from KEGG. Collaborative data exchange occurs with organism-specific resources including FlyBase, WormBase, MaizeGDB, and clinical variant databases like ClinVar and initiatives from the Human Genome Organisation.

Impact and Applications

The resource underpins research across molecular biology, biotechnology, and clinical genomics conducted by laboratories at institutions including Harvard Medical School, Stanford University, Cold Spring Harbor Laboratory, Max Planck Society, and pharmaceutical companies such as Pfizer and AstraZeneca. Applications include protein function prediction cited in studies published in Nature, Cell, and Science, biomarker discovery referenced by consortia like the Cancer Genome Atlas, structural modeling efforts tied to the Protein Data Bank, and clinical variant interpretation used by ClinVar and precision medicine initiatives at centers like Mayo Clinic. The curated dataset also informs computational resources and software developed by communities around Bioconductor, Ensembl, and the European Bioinformatics Institute.

Category:Biological databases