NCI Genomic Data Commons

NCI Genomic Data Commons
Name	NCI Genomic Data Commons
Type	Data repository
Founded	06 June 2016
Location	Chicago, Illinois, United States
Key people	Warren Kibbe, Tony Kerlavage
Parent	National Cancer Institute
Website	https://gdc.cancer.gov/

Contents

Overview
Data and Resources
Access and Tools
Governance and Collaboration
Impact and Use Cases

NCI Genomic Data Commons. The NCI Genomic Data Commons is a unified data repository that enables data sharing across cancer research programs. It was launched by the National Cancer Institute to standardize and harmonize large-scale genomic and clinical datasets from projects like The Cancer Genome Atlas. The system provides the cancer research community with a robust, scalable platform for accessing and analyzing molecular data to advance precision oncology.

Overview

The initiative was officially launched in June 2016 under the leadership of the National Cancer Institute, part of the National Institutes of Health. Its creation was driven by the need to manage the vast amounts of data generated by projects such as The Cancer Genome Atlas and the Therapeutically Applicable Research to Generate Effective Treatments program. The platform is physically hosted at the University of Chicago, leveraging the computational infrastructure of the Center for Data Intensive Science. The core mission is to democratize access to cancer genomic data, supporting the goals of the Cancer Moonshot and the broader Precision Medicine Initiative.

Data and Resources

The repository consolidates multiple data types, including raw DNA sequencing data, processed mutational analyses, RNA-Seq transcriptomes, and DNA methylation profiles. These datasets are harmonized using standardized pipelines, such as those from the Broad Institute and the University of California, Santa Cruz. Clinical data associated with samples, managed in compliance with protocols from the Cancer Therapy Evaluation Program, is also integrated. Key source projects include the Genotype-Tissue Expression project and the Cancer Genome Characterization Initiative. All data is aligned to a common reference genome, GRCh38, to ensure consistency for cross-study analysis.

Access and Tools

Researchers access data through a web-based Data Portal and a powerful Application Programming Interface that supports programmatic queries. The platform provides analytical tools like the Genomic Data Commons Data Analysis Tool for visualization and the ISB-CGC for cloud-based computation. To support large-scale analysis, the data is also available on major cloud platforms like Google Cloud Platform and Amazon Web Services through collaborations with the NIH Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability initiative. User support and documentation are provided to facilitate use by institutions like the Mayo Clinic and Memorial Sloan Kettering Cancer Center.

Governance and Collaboration

Governance is overseen by the National Cancer Institute, with guidance from external scientific panels and the Frederick National Laboratory for Cancer Research. The platform operates through strategic collaborations with bioinformatics centers, including the Institute for Systems Biology and the Ontario Institute for Cancer Research. Data submission and use policies are designed to align with standards from the Global Alliance for Genomics and Health and the NIH Data Sharing Policy. These partnerships ensure the resource evolves with the field, incorporating data from international consortia and new NCI-MATCH Trial findings.

Impact and Use Cases

The resource has become foundational for discovering new biomarkers and therapeutic targets, directly supporting the aims of the Precision Medicine Initiative. It enables validation studies for drugs developed by companies like Genentech and Bristol Myers Squibb. Researchers at institutions such as Dana-Farber Cancer Institute and MD Anderson Cancer Center routinely use it to identify molecular subtypes of diseases like glioblastoma and lung adenocarcinoma. Its role in aggregating data from rare cancers has been particularly impactful, providing statistical power for studies that individual laboratories could not conduct alone, thereby accelerating the pace of translational research.

Category:National Cancer Institute Category:Bioinformatics Category:Cancer research Category:Genomics databases Category:2016 establishments in the United States