Worldwide Protein Data Bank

Worldwide Protein Data Bank
Name	Worldwide Protein Data Bank
Formation	1971
Founder	Lawrence Berkeley National Laboratory, Brookhaven National Laboratory
Type	Consortium
Headquarters	Multiple sites
Region served	Global
Leader title	Directors

Contents

History
Organization and Consortium Structure
Data Content and Curation
Deposition and Validation Processes
Data Access, Distribution, and Services
Standards, Formats, and Interoperability
Impact, Usage, and Community Outreach

Worldwide Protein Data Bank is a global consortium that archives three-dimensional structural data of biological macromolecules and provides free access to experimentally determined models and related experimental data. The consortium coordinates archiving, validation, and distribution through regional data centers and collaborates with scientific organizations, research institutes, and funding agencies to support structural biology, computational biology, and pharmaceutical research. The archive underpins research in fields ranging from virology and immunology to enzymology and drug discovery, serving communities associated with major facilities and projects.

History

The archive traces origins to structural biology work at Lawrence Berkeley National Laboratory and public releases associated with Brookhaven National Laboratory during the early use of synchrotron sources such as Stanford Synchrotron Radiation Lightsource and facilities like European Synchrotron Radiation Facility. Milestones include transitions after initiatives from National Institutes of Health and international coordination influenced by projects involving Protein Data Bank Japan and BioMagResBank. The consortium model emerged to harmonize efforts across centers in North America, Europe, and Asia following expansions tied to major events such as upgrades at Advanced Photon Source and strategic decisions by agencies including Wellcome Trust and European Molecular Biology Laboratory. Technological drivers included the rise of cryo-electron microscopy at institutions like Max Planck Institute for Biophysics and the adoption of integrative methods promoted by collaboratives such as Integrative/Hybrid Methods Task Force.

Organization and Consortium Structure

The consortium comprises regional data centers and partner organizations modeled on cooperative frameworks similar to multinational research infrastructures like CERN and Human Genome Project consortia. Major member centers have included archives situated at organizations affiliated with Research Collaboratory for Structural Bioinformatics and national laboratories aligned with National Science Foundation programs. Governance involves boards and committees drawing expertise from university departments such as Massachusetts Institute of Technology, funding bodies like Medical Research Council (UK), and international science bodies including International Union of Crystallography and World Health Organization advisory groups. Agreements coordinate data policies among stakeholders including repositories such as Europe PMC and computational platforms like Rosetta Commons.

Data Content and Curation

The data holdings include atomic coordinate sets from experiments performed at facilities like Diamond Light Source and Swiss Light Source, electron density maps associated with cryo-EM studies from centers such as European Molecular Biology Laboratory (EMBL) and NMR-derived restraints held by repositories connected to National Magnetic Resonance Facility. Curators interact with primary literature published in journals such as Nature, Science (journal), and Journal of Molecular Biology to reconcile metadata and provenance, working with community standards advanced by committees at International Union of Crystallography and editorial boards of Acta Crystallographica. Cross-references link structures to sequence resources like UniProt, chemical resources like PubChem, and pathway resources such as Reactome.

Deposition and Validation Processes

Deposition workflows align with policies adopted by publishers including Cell Press and Public Library of Science, requiring authors to submit coordinates and experimental data before publication. Validation pipelines incorporate tools developed in collaborations with software projects such as PHENIX, Coot, CCP4 and community validators connected to initiatives at European Bioinformatics Institute and National Center for Biotechnology Information. Curatorial review engages experts from university groups including University of Cambridge and Johns Hopkins University to address issues flagged by validation reports and to ensure compliance with mandates from funders like Wellcome Trust and National Institute of General Medical Sciences.

Data Access, Distribution, and Services

The consortium distributes data through mirror sites and portals maintained by centers linked to institutes such as Rutgers University and Rensselaer Polytechnic Institute, and interoperates with resources like Protein Data Bank Japan and BioMagResBank. Services include programmatic access via APIs used by platforms such as RCSB PDB and visualization tools like PyMOL, UCSF Chimera, and Jmol. Training and outreach efforts coordinate workshops at conferences including Gordon Research Conferences and courses run by organizations such as Cold Spring Harbor Laboratory and European Molecular Biology Organization.

Standards, Formats, and Interoperability

The archive maintains and promotes file formats and metadata schemas developed in coordination with standards bodies such as International Organization for Standardization and scientific consortia like OpenMM and community groups around Crystallographic Information Framework. Legacy formats coexist with modern representations developed for cryo-EM and integrative structures with contributions from projects at EMBL-EBI and software ecosystems including MDAnalysis. Interoperability connects to cheminformatics platforms like ChEMBL and structural annotation projects such as SCOP and CATH for classification and downstream integrative science.

Impact, Usage, and Community Outreach

The archive underpins discoveries credited in high-profile studies involving institutions such as Harvard University, Stanford University, and University of Oxford and supports pharmaceutical programs at companies linked to Pfizer and Roche. Educational use spans university curricula at University of California, Berkeley and outreach at museums and initiatives like AAAS symposia. Community engagement includes collaborations with advocacy groups such as Global Alliance for Genomics and Health and international projects like COVID-19 Data Portal mobilizations, illustrating broad scientific, medical, and technological impact.

Category:Biological databases