PSI-MI — LLMpedia

PSI-MI
Name	PSI-MI
Caption	Molecular Interaction Standard
Formation	2004
Location	International
Focus	Molecular interaction data exchange

Contents

Overview
History and Development
Data Model and Format
Controlled Vocabularies and Identifiers
Tools, Implementations, and Databases
Adoption and Applications
Challenges and Future Directions

PSI-MI PSI-MI defines a community standard for exchanging molecular interaction data among bioinformatics resources, enabling interoperability across resources such as UniProt, European Bioinformatics Institute, National Center for Biotechnology Information, Reactome, and BioGRID. The standard supports integration with resources including Protein Data Bank, IntAct, STRING, MINT, and DIP, promoting consistent annotation used by projects like Ensembl, KEGG, Gene Ontology Consortium, and Swiss-Prot.

Overview

PSI-MI provides a structured schema and controlled vocabularies adopted by groups such as International Society for Computational Biology, Human Proteome Organization, European Molecular Biology Laboratory, Wellcome Trust Sanger Institute, and European Research Council to harmonize molecular interaction records linking experimental datasets from European Nucleotide Archive, ArrayExpress, GenBank, The Cancer Genome Atlas, and ENCODE Project. Its remit overlaps with standards initiatives including HUPO Proteomics Standards Initiative, BioPAX, SBML, MIAPE, and FAIR Principles to facilitate data sharing between platforms like Cytoscape, Gephi, UCSC Genome Browser, and Galaxy Project.

History and Development

The initiative emerged from collaborations among institutions such as European Bioinformatics Institute, Cold Spring Harbor Laboratory, Broad Institute, Max Planck Institute for Molecular Genetics, and EMBL-EBI following workshops involving stakeholders from Wellcome Trust, National Institutes of Health, European Commission, European Molecular Biology Organization, and Japan Science and Technology Agency. Early milestones were influenced by projects at Protein Information Resource, Swiss Institute of Bioinformatics, Institute for Systems Biology, and Institute Pasteur. Subsequent governance included contributors affiliated with Stanford University, Harvard Medical School, MIT, UC Berkeley, Yale University, University of Cambridge, University of Oxford, and Karolinska Institutet.

Data Model and Format

The PSI-MI data model specifies entities such as interactors, interactions, experiments, and participants, interoperating with identifiers from UniProt, NCBI Gene, HGNC, Ensembl, and RefSeq. Serialization formats include XML variants used by IntAct and tab-delimited exchanges compatible with BioGRID, enabling visualization in Cytoscape and analysis with tools from EMBL-EBI and Broad Institute. Cross-references are provided to resources such as Protein Data Bank, Pfam, InterPro, SCOP, and CATH, while alignment with ontologies from Gene Ontology Consortium, Sequence Ontology, BRENDA Tissue Ontology, and Cell Ontology supports semantic integration.

Controlled Vocabularies and Identifiers

PSI-MI maintains controlled vocabularies for interaction types, detection methods, participant identification, and biological roles, coordinating with authorities such as International Nucleotide Sequence Database Collaboration, HUGO Gene Nomenclature Committee, UniProt Consortium, NCBI Taxonomy, and Ontology Lookup Service. Terms map to entries in Gene Ontology Consortium, ChEBI, Disease Ontology, Reactome, and Medical Subject Headings to ensure consistency across datasets contributed by ELIXIR, GOBLET, BioSamples, and ProteomeXchange.

Tools, Implementations, and Databases

Implementations and databases adopting the standard include IntAct, BioGRID, STRING, MINT, DIP, iRefIndex, Mentha, HPRD, PSICQUIC, and IMEx Consortium members. Software tooling includes parsers and exporters developed in collaboration with teams at European Bioinformatics Institute, University College London, EMBL, SIB Swiss Institute of Bioinformatics, and Wellcome Sanger Institute, as well as visualization support in Cytoscape, NAViGaTOR, Gephi, and programmatic access via APIs used by Ensembl, UniProt, NCBI, and Reactome.

Adoption and Applications

PSI-MI is employed in studies and infrastructures associated with The Cancer Genome Atlas, Human Protein Atlas, ENCODE Project, 1000 Genomes Project, and PDBbind to annotate protein–protein, protein–small molecule, and genetic interactions for resources such as ClinVar, COSMIC, PharmGKB, DrugBank, and OMIM. Applications span network analysis for research at Dana-Farber Cancer Institute, Sanger Institute, Broad Institute, and Memorial Sloan Kettering Cancer Center, translational pipelines at GlaxoSmithKline, Pfizer, Roche, and Novartis, and collaborative consortia like IMEx Consortium, HUPO, and GA4GH.

Challenges and Future Directions

Challenges include integrating high-throughput data from platforms like Illumina, Oxford Nanopore Technologies, and Pacific Biosciences, reconciling variant annotations from dbSNP and ClinVar, and coordinating with emerging standards such as GA4GH schemas and resources like Open Targets. Future work emphasizes interoperability with cloud infrastructures provided by Amazon Web Services, Google Cloud Platform, and Microsoft Azure for large-scale computation used by research centers at Argonne National Laboratory and Lawrence Berkeley National Laboratory, tighter alignment with ontologies maintained by OBO Foundry, and enhanced provenance tracking to satisfy funders including European Research Council and National Institutes of Health.

Category:Bioinformatics standards