ProteomicsDB — LLMpedia

ProteomicsDB
Name	ProteomicsDB
Released	2014
Developed by	Max Planck Society, Massachusetts Institute of Technology, European Molecular Biology Laboratory
Type	proteomics database
Access	public

Contents

Overview
Data Content and Coverage
Architecture and Technology
Tools and User Interface
Data Access and Integration
Applications and Use Cases
Governance and Curation

ProteomicsDB ProteomicsDB is an open-access proteomics database developed as a high-dimensional resource for protein-centric quantitative data. It integrates mass spectrometry, expression atlases, and biochemical assays to support research in Human Genome Project, Cancer Research UK, National Institutes of Health, European Research Council, and industrial projects at Bayer AG and Roche. The resource is used in studies connected to International Human Epigenome Consortium, Human Protein Atlas, UniProt, PRIDE (PRoteomics IDEntifications Database), and ProteomeXchange.

Overview

ProteomicsDB was initiated to aggregate large-scale protein abundance, modification, and interaction data from projects such as studies at Max Planck Institute for Biochemistry, Broad Institute, Wellcome Sanger Institute, and collaborations with Novartis and Pfizer. It complements resources like Ensembl, GenBank, RefSeq, PDB, and Gene Ontology by focusing on quantitative proteomics from platforms including laboratories at ETH Zurich, University of Cambridge, Harvard Medical School, Stanford University, and University of Oxford. The database supports reproducible workflows used by consortia such as Human Proteome Organization and benchmarking initiatives from National Center for Biotechnology Information.

Data Content and Coverage

ProteomicsDB stores peptide-spectrum matches, protein quantifications, post-translational modifications, and spectral libraries contributed by groups including European Molecular Biology Laboratory, Cold Spring Harbor Laboratory, Max Delbrück Center for Molecular Medicine, Karolinska Institutet, University of California, San Francisco, Johns Hopkins University, and Massachusetts General Hospital. Coverage spans tissues and cell lines studied at Dana-Farber Cancer Institute, Memorial Sloan Kettering Cancer Center, Dana-Farber, Technical University of Munich, Heidelberg University Hospital, and pharmaceutical datasets from GlaxoSmithKline. Data types align with standards from HUPO Proteomics Standards Initiative, mzML, mzIdentML, and submitter pipelines used at EMBL-EBI and PRIDE Archive. Cross-references include mappings to identifiers in UniProtKB, Ensembl, HGNC, NCBI Gene, and structural links to entries in Protein Data Bank.

Architecture and Technology

The backend employs scalable architectures influenced by implementations at Google, Amazon Web Services, and research infrastructures at European Bioinformatics Institute and Deutsche Forschungsgemeinschaft projects. Technologies reflect practices from Apache Hadoop, Apache Spark, Docker, Kubernetes, and visualization libraries paralleling work at D3.js projects from Princeton University collaborators. Security and authentication models reference protocols used by ORCID, ELIXIR, and FAIR (principles)-aligned initiatives supported by European Commission grants. Integration patterns borrow from APIs designed at National Institutes of Health and Wellcome Trust data platforms.

Tools and User Interface

The web interface offers search, visualization, and download tools comparable to portals developed at UCSC Genome Browser, Ensembl Genome Browser, Human Protein Atlas, and STRING with interactive charts echoing designs from Broad Institute applications. Query features support comparisons used in studies at Scripps Research Institute, Max Delbrück Center, Novo Nordisk Foundation, and educational resources from Cold Spring Harbor Laboratory. Analytical modules accommodate workflows from Sequest, Mascot, MaxQuant, and pipelines developed at European Bioinformatics Institute and ProteomeXchange partners.

Data Access and Integration

Users retrieve data through programmatic APIs and bulk downloads modeled after services at UniProt, NCBI, EBI, and Ensembl. Integration with external systems leverages identifier mappings and converters used by HGNC, RefSeq, PDB, and pathway resources like Reactome, KEGG, and Metacore. Interoperability supports cross-database queries linking to clinical and translational datasets from The Cancer Genome Atlas, Genotype-Tissue Expression Project, UK Biobank, and cohort studies at Framingham Heart Study partners. Data submission and metadata standards follow guidelines from HUPO, ELIXIR, and regulatory frameworks influenced by European Medicines Agency practices.

Applications and Use Cases

Researchers apply ProteomicsDB data in biomarker discovery pipelines similar to projects at Memorial Sloan Kettering Cancer Center, MD Anderson Cancer Center, Mayo Clinic, and consortium studies like International Cancer Genome Consortium. Pharmaceutical teams at AstraZeneca, Bristol Myers Squibb, and Eli Lilly and Company use the resource for target deconvolution and drug-response profiling analogous to initiatives at Genentech and Amgen. Academic groups at ETH Zurich, Karolinska Institutet, University College London, and University of Toronto employ the database for network analysis linked to STRING and pathway annotation in Reactome or KEGG contexts. Education and training programs at Cold Spring Harbor Laboratory and EMBL reference ProteomicsDB-style datasets for workshops.

Governance and Curation

Governance involves collaborations among institutions such as Max Planck Society, European Molecular Biology Laboratory, Helmholtz Association, and funding agencies like German Research Foundation and European Research Council. Curation policies reflect community standards promulgated by HUPO Proteomics Standards Initiative, ELIXIR, and practices adopted by EMBL-EBI and NCBI curation teams. Data stewardship coordinates with consortia including Human Proteome Organization, ProteomeXchange, and national infrastructures at German Network for Bioinformatics Infrastructure partners.

Category:Biological databases