PDB — LLMpedia

PDB
Name	PDB
Established	1971
Type	Repository
Discipline	Structural biology
Country	United States
Affiliations	Research Collaboratory for Structural Bioinformatics, Worldwide Protein Data Bank

Contents

Overview
History
Structure and Function
Data Formats and Standards
Access and Tools
Applications and Impact
Challenges and Future Directions

PDB

The Protein Data Bank is a centralized archive of three-dimensional structural data for biological macromolecules and assemblies, including proteins, nucleic acids, and complexes. It serves as a foundational resource used by researchers associated with institutions such as National Institutes of Health, European Molecular Biology Laboratory, RCSB PDB, Protein Data Bank Japan, and Protein Data Bank in Europe. Scientists from laboratories led by figures like Richard Henderson, Ada Yonath, Venki Ramakrishnan, Jennifer Doudna, and Emmanuelle Charpentier contribute structures that underpin discoveries spanning CRISPR–Cas9, ribosome, hemoglobin, HIV-1 protease, and COVID-19 research.

Overview

The archive collects atomic coordinates, experimental data, and metadata for macromolecules determined by methods such as X-ray crystallography, nuclear magnetic resonance spectroscopy, and cryo-electron microscopy. Users include principal investigators at institutions such as Massachusetts Institute of Technology, Stanford University, Max Planck Society, University of Cambridge, and University of Tokyo, as well as companies like Pfizer, Roche, and AstraZeneca. The resource interoperates with complementary databases including UniProtKB, GenBank, SCOP, CATH, and KEGG to enable integrated analyses.

History

Founded in 1971 through efforts by scientists at Brookhaven National Laboratory and funded by organizations including the United States Department of Energy and National Science Foundation, the archive grew from a handful of depositions like the structure of myoglobin to tens of thousands of entries. Milestones include the establishment of the Research Collaboratory for Structural Bioinformatics in the late 1990s, the creation of the Worldwide Protein Data Bank partnership, and rapid expansion driven by advances from laboratories such as Carl Woese's group, Aaron Klug's team, and later innovators in cryo-EM like Jacques Dubochet and Joachim Frank. Landmark depositions include the crystal structure of DNA polymerase, the atomic model of the ribosome, and structures critical to understanding influenza and SARS-CoV-2.

Structure and Function

Entries describe macromolecules at atomic resolution, including chain composition linked to resources such as UniProtKB accession codes, ligand information referencing small-molecule resources like PubChem and ChEMBL, and annotations that connect to functional studies published in journals like Nature, Science, and Cell. Curators ensure consistency of residue numbering used by teams from European Bioinformatics Institute and Protein Data Bank Japan, and validation reports reference standards adopted by organizations such as International Union of Crystallography. Structural models enable mechanistic insights into enzymes like DNA polymerase I, receptors such as G protein-coupled receptor, and complexes including spliceosome and proteasome.

Data Formats and Standards

Originally distributed in the simple PDB file format, the archive now supports richer formats including mmCIF and XML profiles developed in collaboration with groups like Worldwide Protein Data Bank and International Union of Crystallography. Metadata schemas align with community standards used by FAIR principles proponents and link to identifiers from Digital Object Identifier registries and accession systems of databases like Protein Data Bank in Europe. Validation pipelines incorporate methods and recommendations from committees associated with International Union of Crystallography and are reflected in deposition tools employed by investigators at University of California, San Francisco and Imperial College London.

Access and Tools

Data are accessible through web portals maintained by centers such as RCSB PDB, Protein Data Bank Japan, and Protein Data Bank in Europe, and via programmatic APIs used by platforms including Rosetta Commons, PyMOL, ChimeraX, Phenix, and CCP4. Visualization and analysis tools created by groups led by researchers like Shane C. Blundell and teams at Stanford University facilitate tasks such as molecular replacement, docking, and model refinement. Integrated services connect PDB entries to literature indexed by PubMed and to sequence repositories such as GenBank.

Applications and Impact

Structural data underpin drug discovery campaigns at companies like Novartis and Johnson & Johnson and inform vaccine design efforts showcased by initiatives at Moderna and Pfizer–BioNTech. Academic studies relying on archive entries have elucidated mechanisms in systems including photosystem II, ATP synthase, telomerase, and CRISPR–Cas9. The repository supports education at universities such as Harvard University and Yale University and contributes to award-winning research recognized by prizes such as the Nobel Prize in Chemistry and the Lasker Award.

Challenges and Future Directions

Challenges include managing growth driven by high-throughput cryo-EM facilities at centers like Diamond Light Source and European Synchrotron Radiation Facility, integrating multi-scale models from initiatives such as Human Cell Atlas, and improving representation of intrinsically disordered proteins studied by groups including those at University of Oxford. Ongoing priorities involve enhancing metadata interoperability with resources like BioSamples, adopting machine-readable provenance standards championed by institutions including National Institutes of Health, and enabling community annotations akin to efforts by GitHub-based projects. Advances in artificial intelligence from organizations like DeepMind and academic labs at University of Washington may further accelerate model building, while governance by the Worldwide Protein Data Bank will guide policy on data quality, reproducibility, and open access.

Category:Biological databases