National Inpatient Sample

National Inpatient Sample
Name	National Inpatient Sample
Producer	Agency for Healthcare Research and Quality
Country	United States
Subject	Healthcare, Hospitalization, Administrative claims
Started	1988
Frequency	Annual

Contents

Overview
History and Development
Data Structure and Content
Methodology and Sampling Design
Uses and Applications
Limitations and Criticisms
Access, Privacy, and Use Policies

National Inpatient Sample

The National Inpatient Sample is a United States database of hospital inpatient stays maintained by the Agency for Healthcare Research and Quality and used by researchers, policymakers, and clinicians for analyses of hospitalization patterns, costs, and outcomes. It supports studies informing decisions by institutions such as the Centers for Disease Control and Prevention, the Centers for Medicare & Medicaid Services, and academic centers including Harvard Medical School, Johns Hopkins University, and the University of California. Researchers from organizations like the World Health Organization, the Bill & Melinda Gates Foundation, and the National Institutes of Health have also used it for population health and health services research.

Overview

The dataset is the largest publicly available all-payer inpatient care database in the United States and is part of the Healthcare Cost and Utilization Project overseen by the Agency for Healthcare Research and Quality. It aggregates discharge abstracts from state-level organizations such as the California Department of Public Health, the New York State Department of Health, and the Texas Department of State Health Services, enabling analyses across providers like Mayo Clinic, Cleveland Clinic, and Massachusetts General Hospital. The sample supports linkage of inpatient diagnoses and procedures coded with systems including the International Classification of Diseases, versions ICD-9 and ICD-10, and it is frequently used alongside datasets such as the National Health and Nutrition Examination Survey and the Medical Expenditure Panel Survey.

History and Development

The NIS originated in 1988 when the Agency for Healthcare Research and Quality established the Healthcare Cost and Utilization Project to standardize inpatient data collection across participating states like California, Florida, and New York. Over time, contributors have included state data organizations and federal partners such as the Centers for Medicare & Medicaid Services and the National Center for Health Statistics. Major methodological revisions occurred in the 1990s and 2012, paralleling transitions in coding frameworks such as the move from ICD-9 to ICD-10 and reflecting analytic needs expressed by institutions like Harvard School of Public Health and think tanks including the Kaiser Family Foundation. Scholars affiliated with Columbia University, Stanford University, and the University of Michigan played roles in refining sampling and weighting procedures.

Data Structure and Content

Records in the dataset represent hospital discharge abstracts with variables for patient demographics, diagnoses, procedures, admission and discharge status, payer source, length of stay, and total charges. Variables reference coding systems and standards such as ICD-9, ICD-10, and billing frameworks used by Medicare and private insurers. Hospital attributes in the file may include identifiers for ownership and teaching status relevant to institutions like Johns Hopkins Hospital and UCLA Medical Center, and geographic markers tied to states and regions such as Northeast United States, Midwest United States, and Southwest United States. The data structure supports case-mix adjustment methods used in analyses at organizations like Yale University and Duke University.

Methodology and Sampling Design

The NIS employs a stratified, probability-based sampling design drawing from participating statewide inpatient databases; strata are defined by hospital characteristics such as ownership, bed size, teaching status, and urban-rural location. Weighting algorithms permit national estimates and variance estimation procedures mirror methods used in complex survey analysis by institutions like Princeton University and University of Chicago. The dataset is constructed to allow trend analyses across policy changes influenced by legislation such as the Affordable Care Act and programmatic shifts within Centers for Medicare & Medicaid Services payment models. Methodological guidance has been published and applied by researchers at RAND Corporation and Brookings Institution.

Uses and Applications

Investigators use the data to examine epidemiology, utilization, outcomes, cost trends, and disparities for conditions treated in hospitals, informing policy debates in venues like the United States Congress and analyses by advocacy groups including the American Hospital Association. Clinical researchers at Mayo Clinic Proceedings, New England Journal of Medicine, and JAMA have used the sample to study topics ranging from cardiovascular disease at Cleveland Clinic to surgical outcomes at Johns Hopkins Hospital. Health economists from Harvard Kennedy School and London School of Economics have used it to model resource use and payer impacts; public health agencies such as the Centers for Disease Control and Prevention use it for surveillance of hospitalizations related to infectious diseases and injury.

Limitations and Criticisms

Critiques of the dataset include concerns about representativeness when states opt out or change reporting, potential coding inaccuracies inherent in administrative claims noted by experts at Emory University and University of Pennsylvania, and limited clinical detail compared with registries like the Society of Thoracic Surgeons database or electronic health record systems at institutions like Mount Sinai Health System. Researchers have highlighted challenges in causal inference reflected in debates at American Public Health Association conferences and methodological work at University of Washington. Issues of temporal comparability across coding transitions such as from ICD-9 to ICD-10 and the impact of hospital consolidation involving systems like HCA Healthcare are recurrent topics in peer-reviewed literature.

Access, Privacy, and Use Policies

Access to the data requires purchase and adherence to data use agreements administered by the Agency for Healthcare Research and Quality with oversight consistent with privacy standards promoted by the Office for Civil Rights and federal statutes such as the Health Insurance Portability and Accountability Act of 1996. Users are expected to comply with de-identification standards and institutional review processes at organizations like Institutional Review Boards affiliated with Yale University or Columbia University. Data linkage and re-identification risks are subjects of policy analysis by experts at National Institutes of Health and Electronic Frontier Foundation commentary.

Category:Health datasets