LLMpediaThe first transparent, open encyclopedia generated by LLMs

The Cancer Genome Atlas

Generated by DeepSeek V3.2
Note: This article was automatically generated by a large language model (LLM) from purely parametric knowledge (no retrieval). It may contain inaccuracies or hallucinations. This encyclopedia is part of a research project currently under review.
Article Genealogy
Expansion Funnel Raw 52 → Dedup 0 → NER 0 → Enqueued 0
1. Extracted52
2. After dedup0 (None)
3. After NER0 ()
4. Enqueued0 ()
The Cancer Genome Atlas
NameThe Cancer Genome Atlas
CaptionLogo of the research initiative.
Funding agencyNational Cancer Institute, National Human Genome Research Institute
Active2006–2018
FocusCancer genomics
Website[https://www.cancer.gov/tcga Official website]

The Cancer Genome Atlas. This landmark collaborative project was a comprehensive, multi-institutional effort to systematically map the key genomic alterations in a wide array of human cancers. Jointly funded and managed by the National Cancer Institute and the National Human Genome Research Institute, it created an unprecedented, freely available public repository of molecular data. Its work has fundamentally reshaped the scientific understanding of cancer as a disease of the genome, providing a foundational resource for precision oncology research worldwide.

Overview

This initiative represented a monumental undertaking in the field of biomedical research, aiming to catalog the genetic mutations responsible for cancer through the application of high-throughput genome sequencing technologies. It characterized over 20,000 primary cancer and matched normal samples across 33 different cancer types, including major malignancies such as glioblastoma, lung adenocarcinoma, and breast cancer. The project brought together researchers from institutions like the Broad Institute, Washington University in St. Louis, and Baylor College of Medicine to form a centralized network for data generation and analysis. Its comprehensive approach examined multiple molecular platforms, including DNA methylation, gene expression, and copy number variation, to build integrated molecular portraits of tumors.

History and development

The project was conceived and officially launched in 2006 as a pilot program, initially focusing on three cancer types: glioblastoma multiforme, serous cystadenocarcinoma of the ovary, and lung squamous cell carcinoma. This pilot phase, managed by dedicated project managers and scientific committees, demonstrated the feasibility and immense value of large-scale, multidimensional tumor characterization. Following the success of the pilot, the project was expanded to a full-scale effort in 2009. Over the following decade, it evolved into one of the cornerstones of the Cancer Moonshot initiative, with its data generation phase largely concluding by 2018, leaving a lasting digital legacy for the research community.

Research goals and design

The primary objective was to accelerate the understanding of the molecular basis of cancer by applying genomics and bioinformatics to a large number of well-annotated tumor specimens. Its design involved the coordinated collection of fresh-frozen tumor samples through a network of tissue source sites, which were then centrally processed at designated Biospecimen Core Resources. Comprehensive molecular profiling was conducted at specialized Genome Characterization Centers and Genome Sequencing Centers, utilizing platforms from companies like Illumina and Affymetrix. Rigorous data analysis was performed by Data Analysis Centers, with all information integrated and stored within the Genomic Data Commons to ensure uniformity and accessibility.

Key findings and data

The project yielded transformative discoveries, reclassifying cancers based on molecular fingerprints rather than solely their tissue of origin. Seminal findings included the identification of four distinct subtypes of glioblastoma with different clinical outcomes and the reclassification of endometrial cancer into categories more predictive of prognosis. It revealed the immense genetic heterogeneity within cancers like breast cancer and colorectal cancer, and identified novel, recurrent mutations in genes such as IDH1 in glioma and ARID1A in gastric cancer. The final dataset comprises over 2.5 petabytes of raw and analyzed data, encompassing whole exome sequencing, RNA-Seq, and miRNA profiles for thousands of patients.

Data access and usage

All data generated is made freely available to the global research community through portals like the Genomic Data Commons and the University of California, Santa Cruz Xena browser, with no restrictions on use. Researchers can access level-specific data, from raw sequencing files to processed analyses, enabling a vast range of secondary investigations. This open-access policy has fueled thousands of studies in institutions from MIT to the European Bioinformatics Institute, facilitating discoveries in tumor immunology, drug resistance, and the development of new biomarkers. Tools for data exploration and visualization, such as cBioPortal, have been developed to lower the barrier for utilization by clinicians and biologists.

Impact and legacy

The project's impact on oncology and biomedical science is profound, establishing a new paradigm for cancer research. Its data has become a standard reference, integral to the work of the Clinical Proteomic Tumor Analysis Consortium and numerous pharmaceutical companies in drug development. It directly informed the revision of diagnostic criteria by organizations like the World Health Organization and supported the FDA approval of several targeted therapies. The infrastructure and collaborative model pioneered have inspired subsequent international projects, including the International Cancer Genome Consortium and the Alexandria Project, ensuring its legacy as a catalyst for the era of precision medicine.

Category:Cancer research Category:Genomics projects Category:National Institutes of Health