PRIDE (PRoteomics IDEntifications database)

PRIDE (PRoteomics IDEntifications database)
Name	PRIDE
Title	PRIDE (PRoteomics IDEntifications database)
Discipline	Proteomics
Country	United Kingdom
Established	2004
Hosted by	European Bioinformatics Institute

Contents

Overview
History and Development
Data Content and Structure
Submission and Curation Processes
Access, Tools, and Integration
Impact and Use in Proteomics Research

PRIDE (PRoteomics IDEntifications database) is an archival repository for mass spectrometry-based proteomics data that aggregates peptide and protein identifications, quantitative results, and metadata. Founded to support reproducible science, the resource connects datasets from laboratories, consortia, journals, and funding agencies to large bioinformatics infrastructures and community standards. PRIDE interoperates with repositories, tools, and initiatives across biomedical and computational biology landscapes to enable reanalysis, meta-analysis, and method development.

Overview

PRIDE provides a platform for data deposition and dissemination used by researchers associated with institutions such as European Bioinformatics Institute, Wellcome Trust Sanger Institute, Max Planck Society, National Institutes of Health, Harvard Medical School, Stanford University, Massachusetts Institute of Technology, Broad Institute, European Molecular Biology Laboratory, University of Cambridge, University of Oxford, University of California, San Francisco, Johns Hopkins University, Columbia University, Karolinska Institutet, University of Toronto, McGill University, ETH Zurich, University College London, Imperial College London, University of Tokyo, Seoul National University, Peking University, Tsinghua University, Chinese Academy of Sciences, Max Delbrück Center for Molecular Medicine, Cold Spring Harbor Laboratory, European Research Council, Wellcome Trust, National Science Foundation, European Commission, Biotechnology and Biological Sciences Research Council, Human Frontier Science Program, Gordon and Betty Moore Foundation, Howard Hughes Medical Institute, Medical Research Council, Agence Nationale de la Recherche, Science Foundation Ireland, Australian Research Council, Canadian Institutes of Health Research, National Natural Science Foundation of China, Japan Society for the Promotion of Science, German Research Foundation, Spanish National Research Council, Italian National Research Council, Netherlands Organization for Scientific Research, Swiss National Science Foundation, Fonds de la Recherche Scientifique — FNRS, Korea Research Institute of Bioscience and Biotechnology, Brazilian National Council for Scientific and Technological Development, Russian Academy of Sciences, Indian Council of Medical Research, Genome Canada, Bioconductor.

History and Development

PRIDE originated from efforts led by groups at European Bioinformatics Institute and collaborators at Institute of Molecular Pathology, European Molecular Biology Laboratory, and University of Manchester to address data sharing mandates from publishers such as Nature, Science, Cell, Proceedings of the National Academy of Sciences, and The Lancet. Its timeline includes milestones tied to standards from Human Proteome Organization, Proteomics Standards Initiative, and infrastructure integration with UniProt, Ensembl, Gene Ontology Consortium, ArrayExpress, Expression Atlas, and BioSamples. PRIDE development paralleled initiatives like Human Proteome Project, Cancer Genome Atlas, ENCODE Project, 1000 Genomes Project, and collaborations with software authors from MaxQuant, Proteome Discoverer, Mascot, OpenMS, PeptideShaker, and Skyline. Funding and governance involved organizations such as Wellcome Trust, European Research Council, and National Institutes of Health.

Data Content and Structure

PRIDE stores mass spectrometry outputs including spectral files, peptide-spectrum matches, protein inference results, and quantitative matrices generated by platforms created by groups behind Thermo Fisher Scientific, Sciex, Bruker, Agilent Technologies, Waters Corporation, and community formats aligned with mzML, mzIdentML, mzTab, and TraML. Metadata integrates sample and experimental descriptors referencing ontologies maintained by the Ontology for Biomedical Investigations, Gene Ontology Consortium, Chemical Entities of Biological Interest, and identifiers cross-referenced to UniProtKB, RefSeq, Ensembl, ChEBI, PubMed, Digital Object Identifier, and catalogue entries from European Nucleotide Archive, UniParc, and EMBL-EBI resources. Datasets range from clinical cohorts in centers like Mayo Clinic, Cleveland Clinic, Karolinska University Hospital, and Mount Sinai Hospital to systems biology studies produced by consortia such as Human Cell Atlas, International Cancer Proteogenome Consortium, and ProteomeXchange Consortium.

Submission and Curation Processes

Submitters, often affiliated with laboratories at University of California, Berkeley, Princeton University, Yale University, Duke University, University of Pennsylvania, Scripps Research, Riken, La Jolla Institute for Immunology, and Howard Hughes Medical Institute, follow workflows to deposit raw and processed data, metadata, and experiment descriptions. Curation aligns submissions with community standards promulgated by Proteomics Standards Initiative and quality checks influenced by practices at European Bioinformatics Institute and National Center for Biotechnology Information. Editorial interactions may involve journal editors from Nature Communications, Scientific Reports, Journal of Proteome Research, Molecular & Cellular Proteomics, and Proteomics (journal). Data licensing and access policies reflect policies from funders like Wellcome Trust and European Commission.

Access, Tools, and Integration

PRIDE offers programmatic access and web interfaces interoperable with tools and platforms such as ProteomeXchange Consortium, Ensembl, UniProt, Expression Atlas, PeptideAtlas, MassIVE, Galaxy, OpenMS, MSConvert, PRIDE Inspector, PSI (Proteomics Standards Initiative), PeptideShaker, MaxQuant, Perseus, Skyline, Trans-Proteomic Pipeline, and Comet. Integration facilitates reanalysis workflows using compute infrastructures like European Open Science Cloud, XSEDE, Amazon Web Services, Google Cloud Platform, HPC centers and connects to community resources including GitHub, Zenodo, Figshare, ORCID, CrossRef, FAIR Principles, and Research Councils UK policies.

Impact and Use in Proteomics Research

PRIDE underpins studies by investigators at Broad Institute, Wellcome Trust Sanger Institute, European Molecular Biology Laboratory, University of Cambridge, Stanford University, Harvard Medical School, Massachusetts Institute of Technology, Johns Hopkins University, and numerous consortia including the Human Proteome Project and ProteomeXchange Consortium. It has enabled meta-analyses that inform biomarker discovery in collaborations with National Cancer Institute, European Medicines Agency, World Health Organization, and translational studies at Roche, Novartis, Pfizer, GlaxoSmithKline, AstraZeneca, Merck & Co., Johnson & Johnson, Sanofi, Bayer, AbbVie, Eli Lilly and Company, Takeda Pharmaceutical Company, and Amgen. The archive supports methodological advances cited alongside works by researchers such as Ruedi Aebersold, Lars Malmström, Matthias Mann, Emma Lundberg, Shawn McGlynn, John Yates, Henrik Nielsen, Julia Chamot-Rooke, Sylvia Aringhieri, Benjamin Neale, George Church, Eric Lander, and links to datasets used in publications in Nature, Cell, Science, Cell Systems, Nature Biotechnology, and Nature Methods.

Category:Biological databases