ProteomeXchange — LLMpedia

ProteomeXchange
Name	ProteomeXchange
Formation	2011
Type	Consortium
Purpose	Coordination of proteomics data sharing
Headquarters	International
Region served	Global
Languages	English

Contents

Overview
History and Development
Data Repositories and Participating Resources
Data Submission and Standardization
Access, Tools, and Services
Impact and Usage in Proteomics
Governance and Community Practices

ProteomeXchange ProteomeXchange is an international consortium coordinating the submission, dissemination, and reuse of mass spectrometry-based proteomics data across multiple European Bioinformatics Institute, National Center for Biotechnology Information, Protein Information Resource, Swiss Institute of Bioinformatics, European Molecular Biology Laboratory, Wellcome Trust, National Institutes of Health, European Commission-funded infrastructures. The consortium connects data repositories, standards organizations, research projects, funding agencies, and journals including Nature, Science, Cell (journal), PNAS, The Lancet, and Nature Communications to promote open data practices in proteomics.

Overview

ProteomeXchange provides a coordinated framework linking major repositories such as PRIDE (PRoteomics IDEntifications database), MassIVE, jPOST, PeptideAtlas, GPMDB, iProX, and PASS. It integrates community standards from organizations like HUPO, HUPO-PSI, Trans-Proteomic Pipeline, and BioSchemas while interfacing with infrastructures including ELIXIR, Jisc, European Grid Infrastructure, CERN, and Cloud Native Computing Foundation. Funders and publishers such as the Wellcome Trust, European Research Council, Howard Hughes Medical Institute, Max Planck Society, and John Wiley & Sons endorse ProteomeXchange guidelines to ensure reproducible workflows across collaborations involving German Cancer Research Center (DKFZ), Broad Institute, Stanford University, University of Cambridge, ETH Zurich and industry partners like Thermo Fisher Scientific, Sciex, and Bruker.

History and Development

The initiative emerged following discussions at HUPO meetings and workshops at Cold Spring Harbor Laboratory, EMBL-EBI, and Mass Spectrometry Society conferences, formalizing in 2011 with participation from repositories such as PRIDE (PRoteomics IDEntifications database), PeptideAtlas, and GPMDB. Early milestones included adoption of XML and mzIdentML standards developed by HUPO-PSI and collaborations with projects like Human Proteome Project, ProteomeTools, iHOP, and BioGRID. Subsequent expansions involved integration with large-scale projects at Wellcome Sanger Institute, European Bioinformatics Institute, National Cancer Institute, Human Cell Atlas, and community efforts led by investigators at University of Oxford, University of Washington, Massachusetts Institute of Technology, and Karolinska Institutet.

Data Repositories and Participating Resources

ProteomeXchange links a spectrum of repositories and resources, from archival platforms such as PRIDE (PRoteomics IDEntifications database), MassIVE, jPOST, iProX, and PeptideAtlas to specialized services like ProteomicsDB, GPMDB, Panorama Public, CPTAC Data Portal, and OpenProt. It interoperates with metadata registries and databases including UniProt, Ensembl, RefSeq, Gene Ontology, KEGG, Reactome, ChEBI, InterPro, PDB, Pfam, STRING, BioGRID, IntAct, ArrayExpress, and PRIDE Cluster. Collaborations extend to computational infrastructures such as Galaxy Project, Nextflow, Docker, Singularity, Kubernetes, and AWS-based services, and to annotation efforts at European Molecular Biology Laboratory, National Center for Biotechnology Information, and UniProt Consortium.

Data Submission and Standardization

Submission workflows follow standards developed by HUPO-PSI such as mzML, mzIdentML, mzTab, and TraML, and metadata guidelines aligned with MIAPE and FAIR principles advocated by GO FAIR, ELIXIR, Research Data Alliance, and funders including the National Institutes of Health. Data submitters often use tools from Trans-Proteomic Pipeline, MaxQuant, Proteome Discoverer, Mascot, OpenMS, and SearchGUI to generate standardized output that repositories ingest. The consortium promotes persistent identifiers from Digital Object Identifier (DOI), ORCID, BioStudies, and uses controlled vocabularies maintained by Gene Ontology Consortium, PSI-MS Controlled Vocabulary, and ChEBI for interoperable annotation across projects such as Human Proteome Project and CPTAC.

Access, Tools, and Services

ProteomeXchange-enabled repositories provide programmatic access via APIs and bulk download services compatible with ecosystems like EBI Search, NCBI Entrez, ProteomeCentral, and tools such as PeptideShaker, MSFragger, Perseus, Scaffold, BLAST, and HMMER. Analytical platforms including Galaxy Project, OpenMS, Jupyter Notebook, and R Bioconductor packages like MSnbase and DEP integrate PX datasets for workflows used by researchers at Harvard Medical School, Yale University, Johns Hopkins University, Imperial College London, and Weizmann Institute of Science. Cloud-enabled services from Amazon Web Services, Google Cloud Platform, Microsoft Azure, and national infrastructures such as SNIC facilitate large-scale reanalysis and machine learning efforts tied to initiatives like Human Cell Atlas and ENCODE.

Impact and Usage in Proteomics

ProteomeXchange has catalyzed data reuse in studies published across journals like Nature Methods, Molecular & Cellular Proteomics, Journal of Proteome Research, Cell Reports, and Genome Biology. It underpins consortium projects including CPTAC, Human Proteome Project, ProteomeTools, and clinical proteomics efforts at NIH Clinical Center, Mayo Clinic, Cleveland Clinic, Memorial Sloan Kettering Cancer Center, and Karolinska University Hospital. Reuse of PX datasets supports biomarker discovery, systems biology, interactomics, and structural proteomics involving resources such as UniProt, PDB, Reactome, STRING, and KEGG, and contributes to reproducibility initiatives championed by COPE, ARRIVE, and TOP Guidelines.

Governance and Community Practices

ProteomeXchange governance involves representatives from repositories, community standards groups, funders, and publishers including HUPO, HUPO-PSI, ELIXIR, Wellcome Trust, NIH, European Commission, Nature Research, and Oxford University Press. Best practices and data policies are aligned with recommendations from GO FAIR, Research Data Alliance, Committee on Publication Ethics, and institutional policies at University of Cambridge, Imperial College London, Max Planck Society, and CNRS. Community engagement occurs through workshops at venues like Cold Spring Harbor Laboratory, EMBL-EBI, HUPO Congress, ISMB, and ASMS Annual Conference to refine submission guidelines, metadata standards, and accreditation pathways for public proteomics data sharing.

Category:Proteomics Category:Bioinformatics Category:Data sharing