Protein Data Bank

Protein Data Bank
Name	Protein Data Bank
Established	1971
Type	Biological database
Scope	Macromolecular structures
Country	International

Contents

History
Structure and Content
Data Submission and Validation
Access and Distribution
Software and Tools
Governance and Funding

Protein Data Bank

The Protein Data Bank is an international repository for three-dimensional structural data of biological macromolecules. Founded in 1971, it serves structural biologists, computational chemists, pharmaceutical developers and educators by archiving experimentally determined coordinates and metadata. The archive underpins research linked to Nobel Prize-winning work, large-scale initiatives and structural genomics efforts.

History

The archive originated from efforts at the Brookhaven National Laboratory, where early contributors including Walter Hamilton, G. N. Ramachandran-adjacent researchers and colleagues created coordinate collections that intersected with datasets from Max Perutz and John Kendrew. Early growth reflected collaborations with institutions such as MRC Laboratory of Molecular Biology, Howard Hughes Medical Institute, European Molecular Biology Laboratory, and projects like Human Genome Project spin-offs. Landmark structures from Rosalind Franklin-era legacy experiments, later models by Linus Pauling-influenced groups and techniques refined at Stanford University and Massachusetts Institute of Technology drove adoption. During the 1990s, internationalization involved partners in Japan and Europe including European Bioinformatics Institute and RCSB PDB-linked centers whose work paralleled initiatives at National Institutes of Health and National Science Foundation. The archive’s role in crises such as outbreak responses linked to SARS outbreak and Ebola virus epidemic highlighted its importance for public health collaborations involving groups at Centers for Disease Control and Prevention and World Health Organization.

Structure and Content

The repository contains atomic coordinate files, experimental electron density maps and annotations for proteins, nucleic acids, complexes and ligands from sources including X-ray crystallography practiced at synchrotrons like Advanced Photon Source and Diamond Light Source, nuclear magnetic resonance studies from facilities at University of Cambridge and ETH Zurich, and cryo-electron microscopy reconstructions from instruments at Max Planck Institute for Biochemistry and MRC Laboratory of Molecular Biology. Entries cross-reference literature in journals such as Nature, Science, Cell, Proceedings of the National Academy of Sciences, and Journal of Molecular Biology, and link to chemical resources like PubChem and ChEMBL. Metadata tracks authors affiliated with universities like Harvard University, University of California, San Francisco, Yale University, industrial partners including Pfizer, Roche, GlaxoSmithKline, and collaborative consortia such as Structural Genomics Consortium.

Data Submission and Validation

Deposit workflows interact with community standards developed by groups including International Union of Crystallography, Worldwide Protein Data Bank Partnership members, and expert committees at European Organisation for Nuclear Research-adjacent labs. Submitters from institutions like University of Oxford, University of Tokyo, Columbia University provide atomic models plus experimental data; validation pipelines incorporate algorithms from groups associated with David Baker-linked labs, methods developed by teams at Brookhaven National Laboratory and implementations used in software from Lawrence Berkeley National Laboratory. Deposition policies were influenced by debates involving publishers such as Elsevier and Springer Nature and by funders including Wellcome Trust and European Commission program directives. Community-driven validation reports reference metrics pioneered in studies published by investigators at University of California, San Diego and University of Toronto.

Access and Distribution

Data distribution uses mirrors and services operated by organizations like RCSB PDB, PDBe, PDBj and infrastructure partners at European Bioinformatics Institute and National Center for Biotechnology Information. Users access data via APIs used by platforms at Google Cloud Platform, Amazon Web Services, and analysis portals developed in collaboration with groups at Rosetta Commons and OpenMM developers. Educational outreach leverages partnerships with museums and universities including Smithsonian Institution and Cold Spring Harbor Laboratory. Licensing and open access policies reflect positions taken by funders such as NIH, Wellcome Trust, and international agreements promoted at World Health Organization meetings.

Software and Tools

Visualization, modeling and analysis rely on tools from communities centered at institutions like University of California, San Diego (developers of molecular viewers), contributors to PyMOL origins with ties to PDB-101 educators, developers of fitting and refinement software at Royal Institution-linked labs, and cryo-EM toolkits created by teams at New York Structural Biology Center and EMBL facilities. Widely used packages include software influenced by developers from David Baker’s group, algorithmic contributions from Andrej Sali-affiliated labs, and molecular dynamics engines associated with George Washington University and University of Illinois Urbana-Champaign researchers. Computational pipelines integrate resources from NCBI, UniProt, KEGG, Reactome and cheminformatics tools developed at Scripps Research.

Governance and Funding

Governance structures evolved through partnerships among organizations such as Research Collaboratory for Structural Bioinformatics, Protein Data Bank Japan, European Bioinformatics Institute and consortia formed with input from National Science Foundation, National Institutes of Health, Ministry of Education, Culture, Sports, Science and Technology (Japan) stakeholders and funders like Wellcome Trust. Budgetary and policy decisions reflect consultations with international bodies including Organisation for Economic Co-operation and Development-level science policy groups and advisory input from committees that include representatives from Howard Hughes Medical Institute, Max Planck Society and major research universities such as Princeton University and Johns Hopkins University.

Category:Biological databases