Genome-wide association study

Genome-wide association study
Name	Genome-wide association study
Field	Genetics
Introduced	2005
Notable	International HapMap Project, 1000 Genomes Project

Contents

Introduction
Methodology
Statistical Analysis and Interpretation
Applications and Findings
Limitations and Challenges
Ethical, Legal, and Social Implications

Genome-wide association study A genome-wide association study (GWAS) is a research approach that scans genomes from many individuals to find genetic variants associated with traits or diseases. GWAS links single-nucleotide polymorphisms across cohorts to phenotypes using high-throughput genotyping arrays and imputation against reference panels. Early large-scale efforts such as the International HapMap Project and the 1000 Genomes Project helped establish the methods and public resources that enabled modern GWAS.

Introduction

GWAS emerged in the early 21st century when projects like the Human Genome Project, the International HapMap Project, and consortia including the Wellcome Trust Case Control Consortium created dense variant maps and shared data. Key studies involving cohorts from institutions such as the Framingham Heart Study, the UK Biobank, and the Health and Retirement Study demonstrated the power of population-scale genotyping to detect associations for complex traits. Important contributors and advocates include investigators affiliated with the Broad Institute, the Wellcome Trust, and universities such as Harvard University and Stanford University. GWAS results have been catalogued in resources maintained by organizations like the National Institutes of Health.

Methodology

GWAS typically recruit participants from cohorts assembled by entities such as the Nurses' Health Study, the Rotterdam Study, or the Million Veteran Program and assay DNA using arrays produced by companies like Illumina and Affymetrix. Quality control pipelines developed in collaborative networks such as the Global Lipids Genetics Consortium remove samples and markers with high missingness or unexpected ancestry using reference panels like the 1000 Genomes Project and the Haplotype Reference Consortium. Imputation against these panels increases marker density and uses software packages created by groups at institutions like the Wellcome Trust Sanger Institute and the University of Michigan. GWAS design often employs case–control frameworks exemplified by studies of conditions investigated at centers including the Mayo Clinic and the Johns Hopkins Hospital.

Statistical Analysis and Interpretation

Statistical methods in GWAS draw on approaches popularized in publications from the Wellcome Trust Case Control Consortium and method teams at the Broad Institute and University of Oxford. Analyses commonly use logistic regression or linear mixed models implemented in tools from organizations such as the University of California, Los Angeles and companies like Google DeepMind for large-scale computation. Population stratification correction relies on principal component analysis frameworks developed in software associated with the Perlegen Sciences era and later refined by groups at Princeton University and the University of Chicago. Multiple-testing correction thresholds were popularized following debates among statisticians at venues including the American Statistical Association and major journals. Post-GWAS interpretation often involves fine-mapping and functional annotation using datasets from the ENCODE Project, the Roadmap Epigenomics Project, and laboratory follow-up in institutions such as the National Institutes of Health and the Howard Hughes Medical Institute.

Applications and Findings

GWAS have identified loci for traits and diseases studied by consortia like the CARDIoGRAMplusC4D Consortium (cardiovascular disease), the GIANT Consortium (anthropometry), and psychiatric collaborations such as the Psychiatric Genomics Consortium. Landmark findings implicating genes or regions have been followed up in translational settings at pharmaceutical companies and academic centers including Pfizer, GlaxoSmithKline, Genentech, Massachusetts General Hospital, and the Sanger Institute. GWAS discoveries have informed polygenic risk scores used in research at institutions such as the University of Cambridge and clinical studies run by the National Health Service in the United Kingdom. Traits ranging from height (studied by the GIANT Consortium) to type 2 diabetes (investigated by the DIAGRAM Consortium) and Alzheimer’s disease (studied by the Alzheimer's Disease Genetics Consortium) have yielded replicable loci.

Limitations and Challenges

Challenges highlighted by investigators at the Wellcome Trust and academic centers including Columbia University and Yale University include limited explanatory power for individual variants, the missing heritability problem debated in meetings of the American Society of Human Genetics, and reduced transferability of findings across populations underscored by research in diverse cohorts from the 1000 Genomes Project and the H3Africa Consortium. Technical issues such as genotype array bias, batch effects encountered in studies coordinated with the UK Biobank, and confounding from environmental covariates have been addressed in guidelines from organizations like the International Committee of Medical Journal Editors and workshops at the Cold Spring Harbor Laboratory.

Ethical, legal, and social considerations have been central in policy discussions at bodies such as the National Institutes of Health, the United States Department of Health and Human Services, the European Commission, and advisory groups formed by institutions like the Wellcome Trust. Consent models, data sharing practices exemplified by the dbGaP archive, and privacy risks have prompted governance frameworks in forums including meetings at the World Health Organization and statements from the American Medical Association. Equity concerns about representation and benefit-sharing have led to initiatives such as the All of Us Research Program and collaborations with regional efforts like the H3Africa Consortium and the Latin American Genomics Consortium.

Category:Genetics