GTEx Project

GTEx Project
Name	Genotype-Tissue Expression Project
Abbreviation	GTEx
Start	2010
Country	United States
Institution	National Institutes of Health

Contents

Introduction
History and Development
Methods and Data Collection
Findings and Scientific Impact
Data Access and Resources
Ethical, Legal, and Social Issues

GTEx Project The GTEx Project was an NIH-funded consortium that mapped human tissue-specific gene expression and regulation across multiple tissues, enabling genotype–phenotype investigations. It coordinated large-scale tissue procurement, sequencing, and bioinformatics to link genetic variation with transcriptomic and regulatory variation, serving researchers in genomics, medicine, and computational biology. The project produced multi-tissue reference datasets used by scientists studying disease loci, functional genomics, and molecular mechanisms underlying complex traits.

Introduction

Launched as a collaborative effort between the National Institutes of Health, the National Human Genome Research Institute, the National Cancer Institute, and multiple academic centers, the consortium collected genotype, RNA sequencing, and clinical metadata across dozens of post-mortem tissues. Participants included biorepositories and hospitals such as Baylor College of Medicine, University of Chicago, Vanderbilt University Medical Center, University of Pennsylvania, and Johns Hopkins Hospital. The output integrated with resources like the Encyclopedia of DNA Elements Project, the 1000 Genomes Project, the International HapMap Project, the UK Biobank, and the ENCODE Project to inform interpretation of genome-wide association study signals from consortia such as the Psychiatric Genomics Consortium, the GIANT Consortium, and the CARDIoGRAMplusC4D Consortium.

History and Development

Early planning involved workshops and funding discussions with stakeholders including the National Advisory Council for Human Genome Research, the Office of the Director (NIH), and experts from Broad Institute, Cold Spring Harbor Laboratory, and Sanger Institute. Pilot phases leveraged tissue collection networks at institutions like University of Miami, Emory University, and Mount Sinai Health System and built on prior work from the Human Genome Project and the International Cancer Genome Consortium. Key milestones included initial cohort assembly, pilot RNA-seq releases, phased data freezes, and integration with projects such as Roadmap Epigenomics Project and GTEx Consortium publications in high-profile journals like Nature, Science, and Nature Genetics.

Methods and Data Collection

Sample acquisition used organ procurement organizations, medical examiners, and tissue banks including Gift of Life Donor Program, New England Donor Services, and Transplant Procurement Management to obtain multiple tissues per donor. Laboratory workflows combined DNA genotyping arrays, whole-genome sequencing from platforms by Illumina, Inc., RNA sequencing protocols adapted from Stanford University and Salk Institute groups, and quality control pipelines developed with input from University of California, San Diego, University of California, Los Angeles, and Yale University. Analytic methods integrated tools and resources such as PLINK, STAR (aligner), RSEM, DESeq2, Matrix eQTL, and Tensor QTL to map expression quantitative trait loci and splicing quantitative trait loci, with population genetics context from HapMap Project, 1000 Genomes Project, and reference panels curated by dbGaP and NIH Data Commons partners.

Findings and Scientific Impact

Analyses revealed widespread tissue-specific and cross-tissue regulatory variation, implicating expression quantitative trait loci in disease-associated loci reported by consortia like the Alzheimer's Disease Genetics Consortium, the Cancer Genome Atlas, and the International Parkinson Disease Genomics Consortium. GTEx-derived catalogs of eQTLs and sQTLs informed mechanistic follow-ups in labs at Massachusetts Institute of Technology, Harvard Medical School, Stanford University School of Medicine, Princeton University, and University of Cambridge. Integration of GTEx data with functional annotation databases such as dbSNP, ClinVar, and RefSeq improved interpretation of variants in clinical sequencing initiatives at Mayo Clinic, Cleveland Clinic, and Mount Sinai Health System. Findings influenced projects investigating cardiometabolic traits, neuropsychiatric disorders, and immune-mediated diseases pursued by groups including NIAMS, NIMH, and NHLBI.

Data Access and Resources

GTEx data releases were distributed through controlled-access and open-access repositories managed by dbGaP, the GTEx Portal, and the European Genome-phenome Archive with data usage governed by data access committees and institutional review boards at institutions such as Columbia University, Duke University, and University of Washington. Computational resources and analysis-ready matrices were made available via cloud platforms and collaborations with Amazon Web Services, Google Cloud Platform, and the NCBI Sequence Read Archive. Community tools and visualizers were developed by groups at Broad Institute, UCSC Genome Browser, Ensembl, and GENCODE to facilitate gene-level queries, colocalization analyses with tools like COLOC, and transcript-level browsing used by researchers at Wellcome Sanger Institute and EMBL-EBI.

The project confronted consent, privacy, and data-sharing challenges addressed through policies from the Common Rule, the Health Insurance Portability and Accountability Act, and guidance from the Presidential Commission for the Study of Bioethical Issues. Ethical frameworks involved consultation with institutional review boards at Johns Hopkins University, Stanford University, and University of Pennsylvania and engagement with donor families and advocacy groups such as Alzheimer's Association and the Michael J. Fox Foundation. Debates around ancestry representation, return of results, and secondary use influenced practices recommended by organizations like the Global Alliance for Genomics and Health, the National Academies of Sciences, Engineering, and Medicine, and international funders including the Wellcome Trust and European Commission.

Category:Genomics

Introduction

History and Development

Methods and Data Collection

Findings and Scientific Impact

Data Access and Resources

Ethical, Legal, and Social Issues