Observational Health Data Sciences and Informatics

Observational Health Data Sciences and Informatics
Name	Observational Health Data Sciences and Informatics
Abbreviation	OHDSI
Formation	2014
Purpose	Collaborative research in observational health data
Headquarters	Bethesda, Maryland
Region served	International

Contents

History and Development
Governance, Organizational Structure, and Funding
Key Methods and Technologies
Data Sources and Standardization
Research Applications and Impact
Community, Collaboration, and Training

Observational Health Data Sciences and Informatics is a multi-stakeholder collaborative that develops open-source methods, software, and standards for large-scale observational research using health data. It coordinates contributions from academic centers, companies, and agencies to enable reproducible evidence generation across distributed databases and international networks. The collaborative emphasizes common data models, transparency, and community-driven governance to accelerate studies in pharmacoepidemiology, outcomes research, and comparative effectiveness.

History and Development

Founded in 2014, the collaborative grew from earlier consortia and projects with roots in National Institutes of Health, Centers for Medicare and Medicaid Services, Food and Drug Administration, European Medicines Agency, and academic initiatives such as Columbia University, Harvard University, Stanford University, Massachusetts Institute of Technology, and Johns Hopkins University. Early contributors included leaders from University of Oxford, Karolinska Institutet, University of Pennsylvania, University of California, San Francisco, and University of Michigan, building on models used by UK Biobank, Kaiser Permanente, Veterans Health Administration, Health Level Seven International, and the Observational Medical Outcomes Partnership. Over time, partnerships expanded to include industry players like Roche, Pfizer, AstraZeneca, Novartis, and GlaxoSmithKline as well as regulatory and standards organizations such as World Health Organization and International Organization for Standardization. Major events shaping the collaborative included workshops at National Library of Medicine, presentations at American Medical Informatics Association, and funding discussions involving Wellcome Trust and Bill & Melinda Gates Foundation.

Governance, Organizational Structure, and Funding

The governance model involves steering committees, working groups, and a coordinating center with participation from institutions such as Yale University, Brigham and Women's Hospital, Mayo Clinic, Cleveland Clinic, and Fred Hutchinson Cancer Research Center. Funding mechanisms have combined grants from National Science Foundation, project support from European Commission, contracts with Centers for Disease Control and Prevention, and sponsorship from corporations including IBM and Google. Advisory roles often include representatives from Academy of Medical Sciences, Royal Society, American Heart Association, American College of Physicians, and international bodies like European Medicines Agency and Pan American Health Organization. Intellectual property and open-source licensing draw on precedents set by Apache Software Foundation and Linux Foundation.

Key Methods and Technologies

Methodological innovation integrates causal inference frameworks from researchers affiliated with Yale University, Harvard University, Duke University, Columbia University, and University of California, Los Angeles with software platforms influenced by R Project for Statistical Computing, Python (programming language), GitHub, and tools from Apache Software Foundation. Core technologies include the Common Data Model inspired by Observational Medical Outcomes Partnership, statistical approaches from work at Stanford University and University of Washington, propensity score methods popularized at Johns Hopkins University and Mayo Clinic, negative and positive control guidance from Harvard Medical School investigators, and diagnostics developed with input from Beth Israel Deaconess Medical Center and Mount Sinai Health System. Innovative study designs reference methodologies linked to RAND Corporation and Brookings Institution-associated research. Reproducibility practices mirror standards promoted by National Academy of Sciences and Committee on Publication Ethics.

Data Sources and Standardization

Data partners include electronic health record systems at Mayo Clinic, claims databases from Centers for Medicare and Medicaid Services, biobanks such as UK Biobank and All of Us Research Program, and registries maintained by American College of Cardiology and Society of Critical Care Medicine. Terminology harmonization references standards from SNOMED International, LOINC, Anatomical Therapeutic Chemical Classification System, and International Classification of Diseases used by World Health Organization. Mapping and extract-transform-load pipelines build on implementations at Intermountain Healthcare, Kaiser Permanente, Vanderbilt University Medical Center, and Children's Hospital of Philadelphia. Data governance practices align with policies from European Medicines Agency, General Data Protection Regulation, Health Insurance Portability and Accountability Act, and ethics guidance from World Medical Association and institutional review boards at University of Toronto and Imperial College London.

Research Applications and Impact

Studies conducted using the collaborative’s resources have addressed drug safety and effectiveness questions analogous to investigations led by Food and Drug Administration and European Medicines Agency, comparative effectiveness analyses similar to work at Cochrane, and pandemic-related studies echoing efforts by Centers for Disease Control and Prevention, World Health Organization, and Johns Hopkins Bloomberg School of Public Health. High-profile applications include cardiovascular outcomes paralleling research from American Heart Association and oncology pharmacoepidemiology reflecting collaborations with American Society of Clinical Oncology and National Cancer Institute. Impact is evident in peer-reviewed publications appearing in journals associated with Nature Publishing Group, The Lancet, New England Journal of Medicine, JAMA Network, and BMJ Publishing Group and in policy discussions involving Organisation for Economic Co-operation and Development and G20 health working groups.

Community, Collaboration, and Training

The collaborative maintains vibrant community activities with annual meetings held in venues frequented by American Medical Informatics Association and International Society for Pharmacoepidemiology, training programs akin to those at Harvard Medical School, Stanford Center for Professional Development, and workshops modeled on initiatives from European Society of Cardiology and Royal College of Physicians. Educational materials and contributions are archived on platforms like GitHub, shared at conferences organized by Society for Epidemiologic Research and AcademyHealth, and supported by mentoring from faculty at Columbia University, University of Minnesota, and University of Melbourne. International chapters engage with institutions including Peking University, Seoul National University, University of São Paulo, and University of Cape Town to expand capacity in data science and observational research.

Category:Medical research organizations