TCGA — LLMpedia

TCGA
Name	The Cancer Genome Atlas
Founded	2005
Founders	National Cancer Institute, National Human Genome Research Institute
Country	United States
Fields	Genomics, Oncology, Bioinformatics

Contents

History
Objectives and Scope
Data Collection and Methods
Major Findings and Impact
Data Access and Tools
Criticisms and Limitations

TCGA was a landmark collaborative effort to characterize the genomic changes in human cancers through large-scale sequencing and molecular profiling. Initiated by the National Cancer Institute and the National Human Genome Research Institute, the project coordinated clinical centers, sequencing centers, data repositories, and analytic groups to produce an unprecedented compendium of tumor genomics. The initiative influenced research at institutions such as Broad Institute, Memorial Sloan Kettering Cancer Center, and Dana–Farber Cancer Institute, and shaped subsequent efforts like the International Cancer Genome Consortium and national precision medicine programs.

History

The program was launched in 2005 by the National Institutes of Health agencies National Cancer Institute and National Human Genome Research Institute as a response to calls from the National Academies and stakeholders including American Association for Cancer Research and American Society of Clinical Oncology. Early pilot studies involved collaborations with centers such as the Broad Institute, Washington University School of Medicine, and the University of California, San Francisco. Over its operational period, TCGA expanded through partnerships with sequencing centers like Baylor College of Medicine and data hubs such as the National Center for Biotechnology Information and European Bioinformatics Institute. The program coordinated with cancer hospitals including Memorial Sloan Kettering Cancer Center, Mayo Clinic, and Mount Sinai Health System to obtain biospecimens and clinical annotations. By aligning with consortia like the International Cancer Genome Consortium and programs at the Wellcome Trust Sanger Institute, TCGA set standards later adopted by projects such as the All of Us Research Program and national initiatives in Canada and China.

Objectives and Scope

TCGA aimed to create a comprehensive atlas of genomic alterations across major tumor types to enable mechanistic insights and therapeutic advances. Specific objectives included identifying recurrent somatic mutations, copy-number alterations, DNA methylation patterns, mRNA and microRNA expression profiles, and protein expression changes across cancers such as breast cancer, lung cancer, colorectal cancer, glioblastoma multiforme, and ovarian cancer. The project sought to integrate data across platforms to define molecular subtypes, nominate driver genes, and connect genomic signatures to clinical variables collected by collaborating centers like Johns Hopkins Hospital and Stanford Health Care. Scope encompassed more than thirty tumor types, coordinated by disease-specific working groups and steering committees drawing expertise from institutions including Yale School of Medicine, University of Texas MD Anderson Cancer Center, and Fred Hutchinson Cancer Research Center.

Data Collection and Methods

Biospecimen acquisition relied on tumor and matched normal samples from clinical partners such as Cleveland Clinic and University of Michigan. Standardized protocols for tissue handling, pathology review, and nucleic acid extraction were established with input from the College of American Pathologists and biobanks like ATCC. Assay types included whole-exome sequencing, single-nucleotide polymorphism arrays, RNA sequencing, DNA methylation arrays, and reverse phase protein arrays performed at facilities including Baylor College of Medicine and the Broad Institute. Bioinformatics pipelines for alignment, variant calling, copy-number analysis, and expression quantification were developed with contributions from groups at University of California, Santa Cruz, Harvard Medical School, and Cold Spring Harbor Laboratory. Quality control and data harmonization leveraged resources from the National Center for Biotechnology Information and the Cancer Genome Characterization Initiative to ensure reproducibility across platforms.

Major Findings and Impact

TCGA produced seminal discoveries such as the cataloguing of recurrent mutations in genes like TP53, PIK3CA, KRAS, EGFR, and BRCA1 across tumor types, reshaping molecular classifications for diseases including glioblastoma, endometrial carcinoma, and clear cell renal cell carcinoma. Integrative analyses revealed actionable pathways involving PI3K/AKT/mTOR pathway, RAS/MAPK pathway, and DNA repair mechanisms, informing trials at centers such as MD Anderson Cancer Center and Memorial Sloan Kettering Cancer Center. The project catalyzed precision oncology initiatives exemplified by programs at Dana–Farber Cancer Institute and influenced drug development efforts by companies like Genentech and Pfizer. TCGA-derived molecular subtypes have been incorporated into clinical research for breast cancer subtyping, lung adenocarcinoma stratification, and biomarker discovery guiding trials at Fred Hutchinson Cancer Research Center and Mayo Clinic. Data from the atlas underpinned numerous high-impact publications and resources used by researchers at Stanford University, Princeton University, Yale University, and international partners.

Data Access and Tools

Data distribution was managed through repositories such as the Genomic Data Commons, the National Center for Biotechnology Information's databases, and portals run by the Broad Institute and UCSC Cancer Genomics Browser. Analytical tools and pipelines included software from Broad Institute's Firehose, cBioPortal developed at Memorial Sloan Kettering Cancer Center and MSK-IMPACT, and visualization platforms from University of California, Santa Cruz. Controlled-access clinical and raw sequence data required authorization through data access committees coordinated with the National Institutes of Health and institutional review boards at partner hospitals. Downstream resources spawned by TCGA data include machine-learning models developed at Carnegie Mellon University and translational applications pursued at Mayo Clinic and Mount Sinai Health System.

Criticisms and Limitations

Critiques of the program highlighted limited representation of diverse populations, prompting comparisons to demographic efforts at institutions like CDC-led surveillance and proposals for inclusion modeled on cohorts such as the Framingham Heart Study. Other concerns focused on tumor heterogeneity, single-region sampling biases noted by researchers at Stanford University and Harvard Medical School, and technological limits of early sequencing platforms used by centers including Baylor College of Medicine. Clinical annotation completeness and long-term outcome data were uneven across contributing hospitals like Cleveland Clinic and Johns Hopkins Hospital, constraining some translational analyses. Ethical and privacy debates involving data sharing practices engaged organizations such as the Hastings Center and regulators within the Department of Health and Human Services. Despite limitations, follow-up projects and international consortia have built on TCGA’s framework to address diversity, longitudinal sampling, and single-cell resolution studies at institutions including Wellcome Trust Sanger Institute and European Molecular Biology Laboratory.

Category:Cancer genomics