Generated by GPT-5-mini| OMOP Common Data Model | |
|---|---|
| Name | OMOP Common Data Model |
| Alt | Observational Medical Outcomes Partnership Common Data Model |
| Developer | Observational Health Data Sciences and Informatics |
| Released | 2010s |
| Latest release | OHDSI releases |
| License | open |
OMOP Common Data Model
The OMOP Common Data Model (CDM) is a standardized data schema and set of conventions for organizing clinical and administrative healthcare data to enable large-scale observational research, comparative effectiveness studies, pharmacovigilance, and reproducible analyses across institutions. It provides a consistent table structure, standardized vocabularies, and transformation procedures used by consortia and organizations to harmonize disparate sources such as electronic health records, claims databases, registries, and research cohorts for multi-database studies.
The model was designed to support consistent analytics across disparate datasets by prescribing tables for person-level data, encounters, diagnoses, procedures, drug exposures, measurements, observations, and provider information, facilitating multi-site studies similar to initiatives like ClinicalTrials.gov, Sentinel Initiative, FDA surveillance, European Medicines Agency, National Institutes of Health, and networks such as PCORnet and Vaccine Safety Datalink. The CDM is paired with a suite of analytic methods and tools from consortia including Observational Health Data Sciences and Informatics, enabling reproducible research across partners like Aetion, IQVIA, Optum, Truven Health Analytics, Kaiser Permanente, and academic centers such as Harvard University, Stanford University, Columbia University, and Johns Hopkins University.
Development traces to public–private partnerships and pharmacoepidemiology efforts inspired by programs such as Observational Medical Outcomes Partnership and initiatives involving regulators like the U.S. Food and Drug Administration and stakeholders including Eli Lilly and Company, Johnson & Johnson, and academic groups from Columbia University and University of Pennsylvania. Subsequent community-driven evolution was stewarded by Observational Health Data Sciences and Informatics (OHDSI), with governance and releases influenced by contributors from Surescripts, Mayo Clinic, Massachusetts General Hospital, and international partners including European Medicines Agency collaborators and projects in United Kingdom, Netherlands, Japan, and Australia.
The CDM defines core tables (e.g., PERSON, OBSERVATION_PERIOD, VISIT_OCCURRENCE, CONDITION_OCCURRENCE, PROCEDURE_OCCURRENCE, DRUG_EXPOSURE, MEASUREMENT, OBSERVATION, PROVIDER, and DEATH) and supporting vocabulary tables to represent clinical events and attributes. Its normalized relational schema is designed to map to electronic health record systems such as Epic Systems Corporation, Cerner Corporation, and claims systems like Medicaid and Medicare data feeds used by Blue Cross Blue Shield plans. The model supports temporal provenance, cohort definition, and analytical reproducibility, enabling methods developed in environments like R (programming language), Python (programming language), SqlServer, PostgreSQL, and platforms such as Amazon Web Services and Google Cloud Platform.
A core tenet is harmonization via standardized clinical vocabularies including SNOMED CT, RxNorm, ICD-9-CM, ICD-10-CM, LOINC, CPT (Current Procedural Terminology), and HCPCS. Mapping and concept management workflows interface with resources and stakeholders such as National Library of Medicine and standards organizations like Health Level Seven International (HL7) and initiatives related to Fast Healthcare Interoperability Resources. The vocabulary tables enable crosswalks from proprietary coding systems used by vendors like Allscripts and regional coding schemes used in Canada, France, and Germany.
Implementations leverage ETL (extract, transform, load) pipelines and toolkits developed by OHDSI and partners, including tools such as ATLAS (OHDSI), ACHILLES, WhiteRabbit, Rabbit-in-a-Hat, and analytic packages maintained in repositories by institutions like Columbia University and Harvard Medical School. The ecosystem includes cohort definition, characterization, and estimation tools implementable in R (programming language) and deployed on infrastructures used by All of Us Research Program, regional health information exchanges, academic medical centers, and commercial data providers.
The CDM supports pharmacovigilance studies, comparative effectiveness research, safety signal detection, patient-level prediction modeling, and population-level estimation used by organizations such as the U.S. Food and Drug Administration, European Medicines Agency, World Health Organization, pharmaceutical companies including Pfizer, AstraZeneca, GlaxoSmithKline, and academic research networks like OHDSI, PCORnet, and consortia at Johns Hopkins University and Mayo Clinic. It enables multi-database studies spanning claims databases like Medicare and registries such as Cancer Registries and disease-specific research networks in rheumatology, cardiology, and infectious disease research centers.
Governance and maintenance are coordinated by OHDSI, with contributions from academia, industry, regulatory agencies, and standards bodies such as National Institutes of Health, European Medicines Agency, U.S. Food and Drug Administration, and participating universities and vendors. The community organizes working groups, annual forums, and collaboratives drawing participants from institutions such as Stanford University, Harvard University, Massachusetts General Hospital, Mayo Clinic, University of Oxford, and international partners across Asia, Europe, and the Americas to evolve standards, tools, and best practices.
Category:Health informatics