PRIDE Archive — LLMpedia

PRIDE Archive
Name	PRIDE Archive
Established	2006
Type	Biological data repository
Owner	EMBL-EBI
Country	United Kingdom
Discipline	Proteomics

Contents

Overview
History and Development
Data Content and Coverage
Submission and Curation Processes
Access, Tools, and Format Standards
Impact and Use in Research

PRIDE Archive is a public proteomics data repository that stores mass spectrometry-based proteomics datasets, supporting reproducible research in molecular and cellular biology. It functions as a centralized resource for researchers from institutions such as European Bioinformatics Institute, Wellcome Trust Sanger Institute, Max Planck Society, Harvard Medical School, and Stanford University by enabling deposition, dissemination, and re-use of peptide and protein identification, quantification, and metadata. PRIDE Archive interacts with complementary resources including UniProt, Ensembl, Protein Data Bank, Gene Ontology, and ArrayExpress to integrate proteomics evidence with genomic and structural annotations.

Overview

PRIDE Archive accepts experimental outputs from platforms developed by manufacturers and consortia like Thermo Fisher Scientific, SCIEX, Bruker, MaxQuant, and OpenMS. The repository interoperates with community projects and standards such as ProteomeXchange Consortium, mzML, mzIdentML, mzTab, and HUPO initiatives. PRIDE Archive supports datasets produced by labs affiliated with organizations including Cold Spring Harbor Laboratory, Broad Institute, European Molecular Biology Laboratory, Karolinska Institutet, University of Oxford, University of Cambridge, Massachusetts Institute of Technology, Yale University, Columbia University, University of Tokyo, Peking University, Weizmann Institute of Science, ETH Zurich, University of Melbourne, Monash University, University of Toronto, University of California, San Diego, Johns Hopkins University, Imperial College London, National Institutes of Health, Wellcome Centre for Human Genetics, European Molecular Biology Laboratory-Hamburg, RIKEN, Shanghai Jiao Tong University, Seoul National University, University College London, Duke University, University of Pennsylvania, Vanderbilt University Medical Center, Scripps Research, Ludwig Institute for Cancer Research, Fred Hutchinson Cancer Research Center, MRC Laboratory of Molecular Biology, Institut Pasteur, CNRS, University of Copenhagen, Karolinska University Hospital, University of Barcelona, Max Delbrück Center, University of Zurich, University of Basel, Leiden University Medical Center, University of Amsterdam, University of Geneva, Universität Heidelberg, Technische Universität München, Universidad Autónoma de Madrid, Universidade de São Paulo, University of British Columbia, University of Auckland, University of Illinois Urbana-Champaign, Princeton University, Cornell University, University of Chicago, University of Michigan, Northwestern University, University of California, Los Angeles.

History and Development

PRIDE Archive originated from initiatives at European Bioinformatics Institute and the ProteomeXchange Consortium to address reproducibility concerns highlighted by studies at Nature and Science and by community gatherings such as HUPO World Congress. Early development drew on standards created by Human Proteome Organization working groups and collaborations with software projects including OpenMS and Trans-Proteomic Pipeline. Major milestones include formal adoption by ProteomeXchange Consortium for standardized dataset identifiers, integration with UniProt protein knowledgebase cross-references, and expansion of submission tools following recommendations from workshops at European Molecular Biology Laboratory and Wellcome Trust. PRIDE Archive evolved through grant support and partnerships with funders such as European Research Council, Wellcome Trust, National Institutes of Health, and Biotechnology and Biological Sciences Research Council.

Data Content and Coverage

Content spans identifications and quantifications from experiments associated with diseases and conditions studied at centers like MD Anderson Cancer Center, Mayo Clinic, Cleveland Clinic, Karolinska Institutet, and Royal Marsden Hospital. Dataset types include shotgun proteomics, targeted proteomics (SRM/MRM), data-independent acquisition (DIA/ SWATH), phosphoproteomics, glycoproteomics, and top-down proteomics from instrumentation by Thermo Fisher Scientific, SCIEX, and Bruker. PRIDE Archive links peptide evidence to entries in databases such as UniProt, Ensembl, RefSeq, Reactome, KEGG, PDB, InterPro, and Pfam. Taxonomic coverage includes model organisms and pathogens catalogued in NCBI Taxonomy, including datasets from Homo sapiens, Mus musculus, Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana, Escherichia coli, Mycobacterium tuberculosis, Plasmodium falciparum, Zea mays, Oryza sativa, Danio rerio, Xenopus laevis, and Bos taurus.

Submission and Curation Processes

Submitters originate from institutions such as University of Oxford, Stanford University, Harvard Medical School, University of California, San Diego, University of Cambridge, Max Planck Society, ETH Zurich, and CNRS. Submissions follow ProteomeXchange guidelines and require metadata conforming to standards developed by HUPO and schema types like mzML, mzIdentML, and mzTab. Curation workflows incorporate automated validation and manual review by curators working with teams at European Bioinformatics Institute and collaborating centers. Controlled vocabularies and ontologies from Gene Ontology, PSI-MS, BRENDA Tissue Ontology, and Chemical Entities of Biological Interest are employed to standardize annotations. Embargo and access options align with publisher policies from outlets such as Nature Biotechnology, Cell, Molecular & Cellular Proteomics, Journal of Proteome Research, and PNAS.

Access, Tools, and Format Standards

PRIDE Archive provides web interfaces and programmatic access compatible with tools and platforms including ProteomeXchange Consortium endpoints, PRIDE Inspector, PeptideShaker, MaxQuant, OpenMS, Trans-Proteomic Pipeline, and Galaxy. Data are distributed in community formats such as mzML, mzIdentML, mzTab, and raw vendor formats from Thermo Fisher Scientific, SCIEX, and Bruker. Cross-resource integration supports queries via UniProt, Ensembl, Reactome, IntAct, and BioSamples. Visualization and reprocessing pipelines rely on software like Perseus, Skyline, OpenSWATH, DIA-NN, and Spectronaut.

Impact and Use in Research

PRIDE Archive has enabled studies by groups at Broad Institute, Wellcome Sanger Institute, NIH, University of Cambridge, Harvard Medical School, Stanford University, ETH Zurich, and Max Planck Society that advance proteogenomics, biomarker discovery, and systems biology. Re-analyses leveraging PRIDE datasets contributed to annotations in UniProt, aided curation in Reactome, and supported meta-analyses published in Nature Communications, Genome Biology, Cell Reports, Molecular & Cellular Proteomics, and Nature Biotechnology. The archive underpins large-scale projects such as Human Proteome Project, proteogenomic efforts at CPTAC, and organismal proteome mapping at institutes including EMBL and Wellcome Sanger Institute. Its datasets inform translational work in oncology at MD Anderson Cancer Center and cardiovascular research at Mayo Clinic, and foster method development for software projects like OpenMS, Perseus, and DIA-NN.

Category:Biological databases