Cancer Cell Line Encyclopedia

Cancer Cell Line Encyclopedia
Name	Cancer Cell Line Encyclopedia
Established	2010
Focus	Cancer genomics, pharmacogenomics
Country	United States
Institution	Broad Institute; Novartis

Contents

Overview
History and Development
Dataset Composition and Methodology
Applications in Research and Drug Discovery
Major Findings and Impact
Limitations and Criticisms

Cancer Cell Line Encyclopedia is a collaborative cancer genomics resource created to profile hundreds of human cancer cell lines through genomic, transcriptomic, and pharmacologic characterization. It provides shared datasets for preclinical modeling used by researchers at institutions such as the Broad Institute, Novartis, Harvard Medical School, Massachusetts Institute of Technology, and pharmaceutical companies like Pfizer and Roche. The project has influenced translational efforts across academic centers including Stanford University, University of California, San Francisco, Johns Hopkins University, and consortia such as the International Cancer Genome Consortium and The Cancer Genome Atlas.

Overview

The project compiles multi-omic profiles and drug-response measurements for diverse human cancer cell lines derived from tumors characterized by groups like Memorial Sloan Kettering Cancer Center, Dana-Farber Cancer Institute, and MD Anderson Cancer Center. Researchers integrate datasets with resources such as GenBank, Ensembl, UCSC Genome Browser, Gene Expression Omnibus, and ArrayExpress to allow cross-referencing with landmark studies from NIH, Wellcome Trust Sanger Institute, and the European Bioinformatics Institute. The resource supports analyses leveraging tools from Broad Institute platforms, pipelines from GATK, and visualization via projects like cBioPortal and Oncomine.

History and Development

Initial development involved collaborations between scientists affiliated with the Broad Institute and pharmaceutical partners such as Novartis and interactions with investigators from Dana-Farber Cancer Institute, Massachusetts General Hospital, and Cold Spring Harbor Laboratory. Key contributors published early findings alongside leaders associated with journals like Nature, Science, Cell, and Nature Genetics. The initiative paralleled efforts by groups behind The Cancer Genome Atlas and the International Cancer Genome Consortium and influenced subsequent projects including work at Sanger Institute and programs funded by the National Cancer Institute and European Research Council.

Dataset Composition and Methodology

The collection encompasses cell lines originating from tumor specimens cataloged by institutions such as Memorial Sloan Kettering Cancer Center, Mount Sinai Hospital, and Mayo Clinic. Molecular assays include whole-exome sequencing pipelines built on GATK, RNA sequencing aligned with references from Ensembl and annotations related to RefSeq, copy-number analysis comparing to datasets from 1000 Genomes Project and variant interpretation informed by resources like ClinVar and COSMIC. Drug sensitivity was profiled using compounds supplied by companies like Novartis, Merck, AstraZeneca, and GlaxoSmithKline and analyzed with statistical frameworks used in publications in Nature Medicine and Genome Research. Data processing and provenance employ standards advocated by NCI, NIH Data Commons, and data-sharing platforms such as Synapse.

Applications in Research and Drug Discovery

Researchers at centers including Stanford University, University of California, San Diego, Columbia University, Yale University, and industry labs at Pfizer and AstraZeneca use the resource to prioritize therapeutic targets, validate biomarkers, and guide preclinical testing. Integrative analyses have been combined with datasets from The Cancer Genome Atlas and functional screens from Project DRIVE and DepMap to nominate dependencies linked to alterations cataloged in COSMIC and ClinVar. Drug repurposing studies referenced work from Novartis Institutes for BioMedical Research and trials registered with ClinicalTrials.gov often cite CCLE-derived hypotheses when designing trials at centers like Memorial Sloan Kettering Cancer Center and MD Anderson Cancer Center.

Major Findings and Impact

Analyses using the dataset have revealed recurrent associations between genomic alterations cataloged in COSMIC and drug sensitivity patterns, informing biomarker development that influenced clinical strategies at institutions such as Dana-Farber Cancer Institute and Massachusetts General Hospital. Findings published in outlets like Nature, Cell, and Nature Genetics shaped follow-up studies at Sanger Institute, Broad Institute, and within pharmaceutical R&D groups at Roche and Merck. The resource fostered reproducibility discussions reflected in policy dialogues at NIH and data standards efforts at European Bioinformatics Institute.

Limitations and Criticisms

Critiques from investigators at Cold Spring Harbor Laboratory, Stanford University, and ethicists associated with Harvard Medical School emphasize that cell-line models diverge from patient tumors profiled by The Cancer Genome Atlas and International Cancer Genome Consortium due to long-term culture, clonal selection, and lack of microenvironment components characterized in studies from MD Anderson Cancer Center and Memorial Sloan Kettering Cancer Center. Methodological concerns raised in commentary published in Nature Reviews Cancer and Genome Biology note batch effects, annotation inconsistencies relative to GenBank and RefSeq, and differences when compared to patient-derived xenografts studied at Fred Hutchinson Cancer Research Center and organoid models developed at Hubrecht Institute.

Category:Cancer research databases