Generated by GPT-5-mini| RDKit | |
|---|---|
| Name | RDKit |
| Title | RDKit |
| Developer | Greg Landrum; Novartis; community contributors |
| Released | 2006 |
| Programming language | C++; Python; Java |
| Operating system | Linux; Microsoft Windows; macOS |
| License | BSD-style |
RDKit is an open-source cheminformatics toolkit providing libraries for molecular representation, cheminformatics algorithms, and machine learning interoperability. It supports chemical informatics workflows used in pharmaceutical research, materials science, and academic projects, integrating with many scientific computing platforms and data-science ecosystems. The project is widely adopted in industry and academia and is maintained by contributors from corporations, research institutes, and independent developers.
RDKit offers capabilities for chemical representation, substructure searching, molecular fingerprinting, molecular descriptors, conformer generation, and cheminformatics file I/O. It is designed to interoperate with scientific stacks including NumPy, SciPy, pandas, scikit-learn, and TensorFlow. The toolkit is used alongside software and institutions such as OpenEye Scientific Software, Schrödinger, Novartis, Pfizer, and university groups at Massachusetts Institute of Technology, Stanford University, University of Cambridge, and ETH Zurich.
RDKit traces origins to efforts by Greg Landrum and collaborators to provide a permissively licensed cheminformatics library as an alternative to proprietary toolkits. Early contributions came from researchers affiliated with Novartis and open-source communities inspired by projects like Open Babel and CDK (Chemistry Development Kit). The project evolved through contributions from corporate partners such as GlaxoSmithKline, AstraZeneca, and academic labs at University of California, San Francisco and Imperial College London. RDKit development practices reflect influences from software engineering projects like GitHub, Jenkins (software), and standards from organizations including IUPAC and file formats such as SMILES and SDF.
RDKit implements cheminformatics routines comparable to those in commercial tools from Daylight Chemical Information Systems and Molecular Operating Environment (MOE). Core features include canonicalization and parsing for representations such as SMILES, InChI integration via InChI engines, 2D and 3D coordinate generation similar to methods used in OpenEye toolkits, and conformer searching inspired by algorithms in literature from groups at University of California, Berkeley and University of Minnesota. It provides fingerprint algorithms compatible with descriptors used in publications from Journal of Chemical Information and Modeling and supports machine-learning features interoperable with frameworks like PyTorch and XGBoost.
RDKit is implemented in C++ with language bindings for Python and Java, mirroring multi-language strategies used by projects such as Apache Spark and TensorFlow. Its architecture separates core molecule graph representations, chemistry algorithms, and IO layers, paralleling design patterns from Boost C++ Libraries and LLVM. Performance-sensitive components use optimized numerical routines and memory management techniques informed by designs from BLAS and OpenMP for parallelism. The build and packaging workflows integrate with systems like Conda, Docker, and Continuous integration services found on Travis CI and GitHub Actions.
RDKit is integrated into cheminformatics pipelines with data platforms and toolchains offered by KNIME, Pipeline Pilot, Jupyter Notebook, and cloud services from Amazon Web Services, Google Cloud Platform, and Microsoft Azure. It works alongside visualization tools such as PyMOL, Chimera, and web frameworks that use React or Django. Collaborations and dependencies connect RDKit to projects like OpenEye Toolkits, Bioconductor, RDKit-PostgreSQL integrations and cheminformatics databases such as PubChem, ChEMBL, and ZINC.
RDKit is applied in virtual screening campaigns at pharmaceutical companies including Novartis, AstraZeneca, and Roche, in materials informatics projects at institutions like MIT Materials Research Laboratory, and in academic cheminformatics research published in venues such as Nature Communications and Scientific Reports. Use cases include quantitative structure–activity relationship (QSAR) modeling for drug discovery, scaffold hopping studies cited in Journal of Medicinal Chemistry, compound registration systems in industry, patent landscaping with datasets from European Patent Office, and generative chemistry workflows that integrate with models inspired by work from DeepMind and academic groups at Harvard University.
RDKit is distributed under a permissive BSD-style license enabling adoption by commercial entities such as Novartis and open research groups at National Institutes of Health. The community includes contributors from corporations, independent developers, and academic labs; collaboration and code review practices reflect community norms seen in Apache Software Foundation projects and open-source governance models used by Linux Foundation initiatives. Development activity is coordinated via repositories on GitHub, with issue tracking, release management, and community support channels similar to those used by projects like scikit-learn and pandas.
Category:Cheminformatics Category:Computational chemistry