Protein Information Resource

Protein Information Resource
Name	Protein Information Resource
Formation	1984
Headquarters	Georgetown University (initial), later Carnegie Mellon University (relocated)
Founder	Amos Bairoch (collaboration), Christian B. Anfinsen (influence)
Type	Biological database consortium

Contents

History
Mission and Scope
Databases and Tools
Data Sources and Curation
Access and Implementation
Impact and Applications

Protein Information Resource is a long-standing biomedical bioinformatics initiative that provides curated protein sequence annotations, functional assignments, and analysis tools. Founded amid growing computational biology efforts in the 1980s, it has intersected with major laboratories and institutions that shaped modern molecular biology, bioinformatics, and structural biology. The Resource integrates contributions from academic groups, government centers, and international consortia to support experimentalists, computational biologists, and clinical researchers.

History

The project traces origins to the early era of sequence databases and annotation projects associated with leaders from Georgetown University, Carnegie Mellon University, and collaborators influenced by work at National Institutes of Health laboratories. Its development paralleled milestones such as the establishment of GenBank and the rise of the Human Genome Project, and it built upon methodologies introduced by figures connected to European Molecular Biology Laboratory and individual scientists associated with Swiss Institute of Bioinformatics. Over decades the initiative adapted alongside large-scale efforts like the Protein Data Bank expansion, the formation of the UniProt consortium, and initiatives at the National Center for Biotechnology Information that standardized sequence submission and access. Institutional relocations and collaborations linked it to computational groups at universities known for bioinformatics research and to funding agencies including programs at the National Science Foundation.

Mission and Scope

The Resource aims to provide reliable protein information services for annotation, classification, and function prediction to the scientific community. Its scope encompasses curated protein sequence records, motif and family definitions, and software for sequence analysis that serve researchers in fields represented by institutions such as Harvard University, Massachusetts Institute of Technology, Stanford University, and international partners like European Bioinformatics Institute. The mission aligns with community standards developed in workshops involving organizations such as the International Society for Computational Biology and reference efforts undertaken by groups at Cold Spring Harbor Laboratory and leading medical centers.

Databases and Tools

The initiative maintains multiple integrated components: curated sequence entries, classification systems for protein families and domains, profile libraries for motif detection, and web-accessible analysis utilities. These were developed contemporaneously with tools and databases like BLAST at National Center for Biotechnology Information, profile methods influenced by algorithms from groups at University of Manchester and European Molecular Biology Laboratory, and annotation frameworks similar to those used by UniProt and the Gene Ontology consortium. Software offerings have included sequence similarity search tools, conserved domain viewers, and batch retrieval services used by researchers at institutions including Johns Hopkins University, University of California, San Diego, and Scripps Research Institute.

Data Sources and Curation

Primary inputs include public protein sequences submitted to repositories such as GenBank and structural mappings from the Protein Data Bank. Curation draws on literature linked to journals published by entities like Nature Publishing Group, Cell Press, and the Proceedings of the National Academy of Sciences. Expert curators reconcile experimental reports from laboratories affiliated with universities including Yale University, University of Cambridge, and University of Tokyo with computational predictions developed in collaboration with groups at Carnegie Mellon University and University of Washington. Quality control procedures reflect practices recommended at meetings involving the National Institutes of Health and standards advanced by consortia such as UniProt and the Gene Ontology project.

Access and Implementation

Services are delivered through web portals, programmatic interfaces, and downloadable data packages compatible with bioinformatics environments popularized at Broad Institute and in software ecosystems like Bioconductor and Galaxy (platform). Support for academic and industrial users ties to institutional computing resources and cloud collaborations reminiscent of deployments by Amazon Web Services in genomics contexts and strategy studies conducted at Lawrence Berkeley National Laboratory. Training and outreach have been coordinated through workshops at conferences including the Intelligent Systems for Molecular Biology meetings and summer schools hosted by universities such as University of California, Berkeley.

Impact and Applications

The Resource has underpinned research in structural biology, comparative genomics, functional genomics, and translational studies at organizations from academic laboratories to biotechnology companies. Its annotations and motif libraries contributed to discoveries reported in venues like Nature, Science (journal), and Cell (journal), and have been incorporated in pipelines at pharmaceutical firms and public health laboratories including those associated with Centers for Disease Control and Prevention. Use cases include genome annotation projects comparable to efforts in the Human Microbiome Project, evolutionary studies linked to work at Smithsonian Institution researchers, and protein engineering endeavors influenced by collaborations with institutes such as MIT and California Institute of Technology.

Category:Biological databases Category:Bioinformatics