Protein Data Bank

Protein Data Bank
Title	Protein Data Bank
Type	Structural biology database
Owner	Worldwide Protein Data Bank (wwPDB)
Launch date	1971
Current status	Active

Contents

History
Content and file format
Data deposition and annotation
Data retrieval and analysis tools
Impact and applications

Protein Data Bank. It is a global archive for the three-dimensional structural data of large biological molecules, such as proteins, nucleic acids, and viruses. Managed by the international Worldwide Protein Data Bank consortium, it serves as a foundational resource for researchers in fields like molecular biology, drug discovery, and bioinformatics. The data, obtained primarily through techniques like X-ray crystallography, NMR spectroscopy, and cryo-electron microscopy, are freely accessible to the public, driving countless scientific advances.

History

The archive was founded in 1971 at Brookhaven National Laboratory by Walter Hamilton and colleagues, following pioneering work by crystallographers like John Kendrew and Max Perutz. Its initial collection contained just seven structures, including the landmark model of myoglobin. In 1998, management shifted to the Research Collaboratory for Structural Bioinformatics, a partnership involving Rutgers University, the University of California, San Diego, and the National Institute of Standards and Technology. This transition marked the beginning of its internationalization, which was formalized in 2003 with the establishment of the Worldwide Protein Data Bank, incorporating partners like the Protein Data Bank Japan at Osaka University and PDBe at the European Bioinformatics Institute.

Content and file format

The archive stores atomic coordinate files detailing the positions of atoms within macromolecules, along with experimental metadata and derived structural features. The primary, legacy file format is the PDB file format, a text-based standard with specific records for atom coordinates, secondary structure, and crystallographic details. To address limitations of this format, the newer mmCIF (macromolecular Crystallographic Information File) format, developed under the auspices of the International Union of Crystallography, was adopted as the standard archival representation. Data for complex structures determined by cryo-electron microscopy are often stored in the EMDB map archive, which is closely integrated with the main repository.

Data deposition and annotation

Deposition of structural data is mandatory for publication in most major scientific journals, including Nature, Science, and Cell. Authors submit data files and experimental details through member sites of the Worldwide Protein Data Bank, such as the RCSB PDB portal. Expert annotators then process and validate the entries, checking for consistency with the experimental data, proper chemical description of molecules like ligands and nucleotides, and overall geometric quality. This rigorous curation ensures the reliability and utility of the archive for the global research community.

Data retrieval and analysis tools

Users can search and retrieve data through various web portals provided by RCSB PDB, PDBe, and PDBj. These sites offer advanced query capabilities based on criteria like protein name, author, organism (e.g., Homo sapiens), or enzyme classification number. A suite of integrated analysis and visualization tools is available, including Mol*, JSmol, and UCSF Chimera, allowing researchers to interact with three-dimensional models. Specialized resources like the PDBsum database provide schematic analyses of protein-ligand interactions, hydrogen bond networks, and protein folding motifs.

Impact and applications

The resource has had a profound impact on biomedical science, forming the structural basis for understanding mechanisms of diseases like Alzheimer's disease and cancer. It is indispensable for structure-based drug design, enabling the development of pharmaceuticals such as HIV protease inhibitors and kinase inhibitors. In basic research, it supports studies in evolutionary biology, enzymology, and systems biology. Major initiatives like the Protein Structure Initiative and recent advances in AlphaFold have further expanded its scope and interconnectedness, cementing its role as a critical pillar of modern scientific infrastructure.

Category:Bioinformatics Category:Molecular biology Category:Scientific databases