PDB archive — LLMpedia

PDB archive
Name	Protein Data Bank archive
Established	1971
Type	biological database
Discipline	Structural biology, bioinformatics
Country	United States

Contents

PDB archive is the global repository for three-dimensional structural data of biological macromolecules. It serves as a primary resource for researchers in Alfred Nobel-era institutions, Allen Institute, Massachusetts Institute of Technology, Harvard University, and industrial laboratories such as GlaxoSmithKline, Pfizer, and Roche. The archive underpins structural studies cited in work from Francis Crick, James Watson, Rosalind Franklin, Max Perutz, and contemporary laboratories at European Molecular Biology Laboratory, University of Oxford, and Stanford University.

History and development

The archive originated in the early 1970s through collaboration among scientists at Brookhaven National Laboratory, Cold Spring Harbor Laboratory, and University of Cambridge. Early contributors included researchers associated with Royal Society meetings and projects linked to National Institutes of Health, Wellcome Trust, and Howard Hughes Medical Institute. Over decades the archive evolved alongside milestones such as the development of X-ray crystallography methods by William Henry Bragg, advances in Nobel Prize in Chemistry-winning techniques, and adoption by consortia including Protein Structure Initiative and structural genomics centers funded by Department of Energy. The internationalization of the archive paralleled efforts by Research Councils UK, European Research Council, and partnerships with Riken and Max Planck Society.

The archive stores coordinate sets, experimental data, and metadata for proteins, nucleic acids, complexes, and assemblies from projects affiliated with institutions like Stanley Cohen-linked laboratories, Salk Institute, and Johns Hopkins University. Each entry encodes atomic coordinates, residue annotations, and chemical component dictionaries consistent with standards from International Union of Crystallography and community initiatives such as Worldwide Protein Data Bank cooperation among RCSB PDB, PDBe, PDBj, and BMRB. The data model supports representations originating from X-ray crystallography, Nuclear Magnetic Resonance, and Cryo-Electron Microscopy experiments; this enables cross-referencing with databases like UniProt, GenBank, RefSeq, Ensembl, and PubChem. Chemical components are linked to identifiers used by Chemical Abstracts Service and ontologies harmonized with Gene Ontology and Systems Biology Markup Language initiatives.

Researchers from laboratories such as Cold Spring Harbor Laboratory, European Molecular Biology Laboratory, and Max Planck Institute submit structures via deposition systems managed by organizations including RCSB PDB and Protein Data Bank Japan. Submissions require experimental data files and metadata aligned with community standards developed during meetings at International Conference on Structural Genomics and workshops sponsored by National Science Foundation and Wellcome Trust. Validation pipelines incorporate software tools originating from teams at Lawrence Berkeley National Laboratory, Argonne National Laboratory, and academic groups led by investigators associated with Emory University and University of California, San Diego. Validation reports cite benchmarks from initiatives like Critical Assessment of Structure Prediction and compliance with requirements from journals such as Nature, Science, and Cell.

The archive is distributed through regional partners including RCSB PDB (United States), PDBe (Europe), and PDBj (Japan), with mirrors at centers such as Stanford Synchrotron Radiation Lightsource and Diamond Light Source. Access policies reflect commitments to open science promoted by National Institutes of Health and Wellcome Trust, enabling downloads used by researchers at University of Cambridge, University of Tokyo, Tsinghua University, and industry groups at Novartis and AstraZeneca. Data formats compatible with tools developed at European Bioinformatics Institute and standards committees convened at International Union of Crystallography support programmatic access via APIs used in pipelines at Google DeepMind-linked projects and community platforms such as GitHub.

A broad ecosystem of visualization, analysis, and modeling software integrates with the archive, including packages developed at University of California, San Francisco, Weizmann Institute of Science, and UCSF ChimeraX teams. Computational methods from groups at DeepMind, University of Washington, and Broad Institute leverage archive data for machine learning, while modeling suites from Rosetta Commons and crystallography software originating at Lawrence Berkeley National Laboratory enable refinement. Services for ligand validation and chemical component mapping link to databases maintained by Chemical Abstracts Service and tools used in workflows at European Molecular Biology Laboratory and EMBL-EBI.

Governance is coordinated by an international consortium including entities such as RCSB PDB, PDBe, PDBj, and advisory boards with representation from National Science Foundation, Wellcome Trust, and research leaders at Harvard Medical School and Yale University. Curation workflows are performed by staff trained in standards promulgated by International Union of Crystallography and liaise with journal editors at Nature Structural & Molecular Biology and funding agencies like National Institutes of Health. Policies on deposition, embargo, and access reflect community norms established in meetings at Cold Spring Harbor Laboratory and symposia organized by Gordon Research Conferences.

The archive underlies discoveries in drug design at companies such as Pfizer and Roche, vaccine design efforts connected to Bill & Melinda Gates Foundation-funded projects, and computational breakthroughs from teams at DeepMind and University of Toronto. It supports education and outreach in courses at Massachusetts Institute of Technology and University of California, Berkeley, and informs patents and regulatory submissions involving U.S. Food and Drug Administration reviews. Across basic and applied science, the archive is cited in work by Nobel laureates and research groups at Max Planck Society, Sloan Kettering Institute, Johns Hopkins University School of Medicine, and international consortia driving structural biology forward.

Category:Biological databases