Pan-Cancer Analysis of Whole Genomes

Pan-Cancer Analysis of Whole Genomes
Title	Pan-Cancer Analysis of Whole Genomes
Established	2018–2020
Participants	International Cancer Genome Consortium, The Cancer Genome Atlas, Wellcome Trust Sanger Institute, European Molecular Biology Laboratory, Broad Institute, Genomics England
Location	Cambridge, England, Boston, Massachusetts, Hinxton
Focus	Whole-genome sequencing of cancer

Contents

Background and Objectives
Methods and Data Collection
Key Findings and Mutational Landscapes
Biological and Clinical Implications
Computational Tools and Analytical Frameworks
Limitations and Challenges
Future Directions and Ongoing Research

Pan-Cancer Analysis of Whole Genomes is a coordinated international project that performed integrated whole-genome sequencing across many tumor types to map somatic mutations, structural variants, and noncoding alterations. The effort aggregated data from major consortia and institutions to create a unified atlas linking genomic events to cancer phenotypes and clinical annotations. It produced comprehensive catalogs and methodological standards used by groups in translational oncology, genomics, and computational biology.

Background and Objectives

The initiative grew from collaborations among International Cancer Genome Consortium, The Cancer Genome Atlas, Wellcome Trust Sanger Institute, Broad Institute, European Molecular Biology Laboratory, and national programs such as Genomics England and the National Institutes of Health. Key objectives included harmonizing whole-genome sequencing across cohorts assembled by Dana-Farber Cancer Institute, Memorial Sloan Kettering Cancer Center, Institut Curie, Fred Hutchinson Cancer Research Center, and other centers; identifying driver mutations in coding and noncoding regions; and establishing resources for investigators at Cold Spring Harbor Laboratory, Harvard Medical School, Stanford University, and University of Oxford. The project aimed to inform efforts by agencies like the National Cancer Institute and funders such as the Wellcome Trust and Cancer Research UK.

Methods and Data Collection

Samples and clinical metadata were contributed from biobanks and hospitals including Royal Marsden Hospital, Guy's and St Thomas' NHS Foundation Trust, Addenbrooke's Hospital, Johns Hopkins Hospital, and Mayo Clinic. Sequencing pipelines were implemented at centers such as the Wellcome Sanger Institute and the Broad Institute using platforms originally developed by companies and initiatives tied to Illumina, Oxford Nanopore Technologies, and consortia like 1000 Genomes Project. Computational workflows incorporated tools and practices from Genome Analysis Toolkit, Picard, and projects at European Bioinformatics Institute and EMBL-EBI. Data governance involved frameworks from Global Alliance for Genomics and Health, ethical oversight informed by guidelines from World Health Organization and national review boards at University of California, San Francisco and King's College London.

Key Findings and Mutational Landscapes

The analysis revealed recurrent patterns across tumor types studied at institutions including Memorial Sloan Kettering Cancer Center, MD Anderson Cancer Center, University of Tokyo, and Karolinska Institutet. It cataloged coding drivers previously noted in work from Venter Institute collaborators and identified noncoding regulatory mutations paralleling findings from ENCODE and Roadmap Epigenomics Project. Structural variation signatures correlated with reports from Cancer Research UK teams and laboratories at Cold Spring Harbor Laboratory and Dana-Farber Cancer Institute. The project described mutational signatures related to exposures characterized in studies by International Agency for Research on Cancer, with patterns reminiscent of research from Harvard T.H. Chan School of Public Health and Mount Sinai Health System cohorts.

Biological and Clinical Implications

Findings influenced clinical interpretation at centers like Memorial Sloan Kettering Cancer Center and guidelines produced by bodies such as American Society of Clinical Oncology and European Society for Medical Oncology. Noncoding driver discoveries informed translational work at Novartis, Roche, and academic translational units at Fred Hutchinson Cancer Research Center and University College London. Insights into structural variants and chromothripsis affected diagnostic assays developed at Mayo Clinic and precision oncology programs at Dana-Farber Cancer Institute and MD Anderson Cancer Center. The atlas supported biomarker efforts linked to immunotherapy trials conducted at Dana-Farber Cancer Institute and pharmaceutical partners including Merck and Bristol-Myers Squibb.

Computational Tools and Analytical Frameworks

The consortium standardized pipelines drawing on software ecosystems developed at Broad Institute, European Bioinformatics Institute, Wellcome Sanger Institute, and research groups at Stanford University School of Medicine. Analytical frameworks integrated variant callers and structural variant tools also used in studies from Harvard Medical School, Yale School of Medicine, and Columbia University Irving Medical Center. Data portals and visualization were influenced by platforms from cBioPortal developers, UCSC Genome Browser, and resources maintained at Ensembl and GENCODE. Reproducibility initiatives involved practices advocated by Open Science Framework and data sharing principles from Global Alliance for Genomics and Health.

Limitations and Challenges

The project faced challenges noted by contributors at Wellcome Sanger Institute, Broad Institute, and European Molecular Biology Laboratory: heterogeneity of clinical annotations from hospitals such as Royal Marsden Hospital and Johns Hopkins Hospital; variable sequencing depth compared with cohort studies like 100,000 Genomes Project; and limitations in detecting complex rearrangements highlighted by groups at Cold Spring Harbor Laboratory. Ethical and legal constraints mirrored cases discussed by World Health Organization committees and national regulators including Health Research Authority (UK) and U.S. Food and Drug Administration. Analytical biases and batch effects echoed concerns raised in literature from Harvard T.H. Chan School of Public Health and University of California, Los Angeles.

Future Directions and Ongoing Research

Ongoing work builds on collaborations with Genomics England, All of Us Research Program, 100,000 Genomes Project, and clinical networks at Memorial Sloan Kettering Cancer Center and Mayo Clinic to expand diversity, longitudinal sampling, and single-cell integration led by groups at Stanford University and Broad Institute. Integrative efforts link epigenomic maps from ENCODE and Roadmap Epigenomics Project with proteogenomic data developed by Clinical Proteomic Tumor Analysis Consortium and translational programs at Fred Hutchinson Cancer Research Center. Future aims include clinical translation through partnerships with regulatory agencies such as European Medicines Agency and U.S. Food and Drug Administration, and implementation in precision oncology consortia at institutions including Dana-Farber Cancer Institute and MD Anderson Cancer Center.

Category:Cancer genomics