DOCK — LLMpedia

DOCK
Name	DOCK
Developer	University of California, San Francisco; laboratories and collaborators
Released	1982
Latest release	multiple versions (iterative)
Programming language	C, Fortran, Python (bindings)
Operating system	Unix-like, Linux, macOS, Windows (via ports)
Genre	Molecular docking, computational chemistry, cheminformatics
License	academic, open-source variants, commercial licenses

Contents

Overview
History and Development
Architecture and Algorithms
Applications and Use Cases
Performance and Evaluation
Integration and Extensibility
Licensing and Availability

DOCK

DOCK is a family of molecular docking programs designed to predict how small molecules, such as ligands, bind to macromolecular targets, such as proteins and nucleic acids. It originated in academic computational chemistry settings and has been used in structure-based drug discovery, virtual screening, and fragment-based design. The software suite interfaces with experimental structural biology pipelines and cheminformatics toolkits to propose, score, and rank binding poses for hypothesis-driven research.

Overview

DOCK implements geometric matching, scoring functions, and sampling techniques to position ligands into receptor binding sites derived from X-ray crystallography, NMR spectroscopy, or cryo-EM maps. Its workflow connects structure preparation steps used in Protein Data Bank deposits with virtual screening strategies commonly employed by groups at institutions like Harvard University, Massachusetts Institute of Technology, and Stanford University. The suite includes utilities for grid generation, energy evaluation, conformer enumeration, and hit selection, interacting frequently with toolchains from OpenEye Scientific Software, Schrödinger, and community projects such as RDKit.

History and Development

Development began in the early 1980s in computational chemistry laboratories affiliated with prominent research centers that also produced tools used by National Institutes of Health-funded programs and international consortia. Early milestones paralleled landmark studies in structure-based drug design published by teams at University of California, San Francisco and collaborative work with groups at Scripps Research Institute and GlaxoSmithKline. Subsequent versions incorporated advances inspired by methods from contributors associated with Cambridge University and industrial partners at Pfizer and Novartis. Community-driven enhancements and forks emerged from academic labs with ties to projects at European Molecular Biology Laboratory and initiatives supported by Wellcome Trust grants.

Architecture and Algorithms

The architecture separates preprocessing, sampling, scoring, and postprocessing modules similar to patterns used in software from Rosetta and platforms developed at Lawrence Berkeley National Laboratory. Sampling algorithms include rigid-body placement, flexible ligand replacement, and fragment linking comparable to strategies in publications from Columbia University and Johns Hopkins University. Scoring incorporates physics-based terms (van der Waals, electrostatics) and empirical components reminiscent of approaches used by researchers affiliated with University of Cambridge and Karolinska Institutet. Grid-based potentials and fast lookup tables enable throughput comparable to high-performance implementations at Los Alamos National Laboratory and optimized libraries used in supercomputing centers like Argonne National Laboratory.

Applications and Use Cases

Groups in academia and industry have applied the software to target families such as kinases studied at Dana-Farber Cancer Institute, G-protein-coupled receptors investigated at Scripps Research Institute, and viral proteases analyzed in collaborations with Centers for Disease Control and Prevention. Use cases include virtual screens of libraries curated by repositories like ZINC database contributors, fragment-based campaigns coordinated with structural genomics consortia including Structural Genomics Consortium, and lead optimization workflows used by teams at AstraZeneca and Bayer. The suite also supports educational exercises in structural biology courses at universities like University of Oxford and University of Tokyo.

Performance and Evaluation

Benchmarking studies have compared the software against contemporaries such as tools from Schrödinger and platforms developed by OpenEye Scientific Software using datasets derived from the Protein Data Bank and curated challenge sets produced by community initiatives like the D3R Grand Challenge. Metrics include pose reproduction (root-mean-square deviation), enrichment in virtual screens (ROC AUC), and computational throughput on clusters similar to those at Texas Advanced Computing Center. Publications from groups at University of California, San Diego and Imperial College London report mixed strengths: robust sampling for rigid targets and variable performance for highly flexible binding sites, motivating hybrid workflows that combine ensemble docking and machine-learning rescoring.

Integration and Extensibility

The codebase exposes interfaces for scripting and pipeline integration, enabling connections with cheminformatics toolkits such as RDKit and visualization systems like PyMOL and UCSF Chimera. Bioinformatics pipelines at institutes including European Bioinformatics Institute and Max Planck Society laboratories integrate DOCK components into automated hit triage, while community plugins allow interoperability with workflow managers like Snakemake and Nextflow. Extensibility points permit incorporation of force fields and solvation models developed by groups at Swiss National Supercomputing Centre and custom scoring functions from academic collaborators.

Licensing and Availability

Distribution models vary: academic licenses historically enabled use in noncommercial research at institutions such as University of California, San Francisco and Yale University, while commercial agreements serve industrial partners like Merck and Roche. Open-source forks and modules are hosted and maintained by community contributors with provenance tied to repositories used by developers at GitHub-hosted projects and mirrored by institutional archives at universities including University of Minnesota. Users typically obtain preprints and documentation distributed through academic channels associated with major structural biology conferences like Gordon Research Conferences and workshops at Cold Spring Harbor Laboratory.

Category:Computational chemistry software