ENCODE Project Consortium

Contents

ENCODE Project Consortium

The ENCODE Project Consortium was an international multi-institutional research initiative formed to map functional elements in the human genome. It involved numerous laboratories from institutions such as National Human Genome Research Institute, Broad Institute, Wellcome Trust Sanger Institute, European Molecular Biology Laboratory, and University of California, San Diego, and coordinated large-scale efforts comparable in scale to the Human Genome Project, the 1000 Genomes Project, and the International HapMap Project. The Consortium’s outputs were disseminated through venues including Nature Genetics, Science (journal), and Genome Research.

Background and Origins

The Consortium emerged after meetings involving leaders from National Institutes of Health, Department of Energy (United States), and senior investigators associated with Human Genome Project milestones at Baylor College of Medicine, Stanford University, Massachusetts Institute of Technology, Harvard University, and Cold Spring Harbor Laboratory. Key precursor projects and infrastructures included the Human Genome Project, the International HapMap Project, and initiatives at Wellcome Trust Sanger Institute. Influential figures connected to early planning hailed from institutions such as Johns Hopkins University, University of Cambridge, Yale University, University of Washington (Seattle), and Rockefeller University.

ENCODE set out objectives that paralleled missions of the Human Genome Project and the 1000 Genomes Project: to identify all functional elements in the human genome sequence produced by consortia including International Human Genome Sequencing Consortium. The scope encompassed work at consortia nodes like Broad Institute, Genome Institute at Washington University in St. Louis, European Bioinformatics Institute, Wellcome Trust Sanger Institute, and clinical-community partners such as Mayo Clinic and Cleveland Clinic. Objectives linked to downstream applications intersected with efforts by National Cancer Institute, American Cancer Society, Howard Hughes Medical Institute, Bill & Melinda Gates Foundation, and translational outlets at Oxford University, Imperial College London, and UCSF.

The Consortium deployed experimental and computational methods deployed previously in projects at Broad Institute, Sanger Institute, and EMBL-EBI. High-throughput assays included chromatin immunoprecipitation followed by sequencing (ChIP-seq) used by teams at Stanford University, University of California, Berkeley, and Washington University in St. Louis; RNA-seq pipelines refined at Harvard University, MIT, and Cold Spring Harbor Laboratory; DNase-seq protocols implemented at Yale University and University of Chicago; and chromatin conformation assays (Hi-C) developed in labs affiliated with Massachusetts Institute of Technology, University of Toronto, and European Molecular Biology Laboratory. Data processing leveraged resources such as NCBI, European Nucleotide Archive, UCSC Genome Browser, and analysis platforms from Ensembl, GENCODE, and RefSeq teams at GENCODE Consortium partners. Experimental workflows were coordinated across facilities including Lawrence Berkeley National Laboratory, Los Alamos National Laboratory, and computing centers at Oak Ridge National Laboratory.

The Consortium reported that a substantial fraction of the human genome exhibited biochemical activity, building upon catalogs from Human Genome Project and annotations from RefSeq and Ensembl. Major contributions influenced studies at National Cancer Institute, International Cancer Genome Consortium, and disease-focused groups at Michael J. Fox Foundation and Alzheimer's Association. ENCODE datasets enabled discoveries in regulatory genomics that intersected with efforts by 1000 Genomes Project and GTEx Consortium, informing variant interpretation used by clinical resources at ClinVar and OMIM. The project supported creation and refinement of resources such as GENCODE, the UCSC Genome Browser, and standardized pipelines that were adopted by groups at Broad Institute, Wellcome Trust Sanger Institute, and EMBL-EBI.

The Consortium was organized as a network of production laboratories, data coordination centers, and analysis groups drawn from institutions like Broad Institute, Wellcome Trust Sanger Institute, European Bioinformatics Institute, Harvard Medical School, University of California, San Diego, Stanford University, Yale University, Johns Hopkins University, Cold Spring Harbor Laboratory, and Washington University in St. Louis. Governance models mirrored those used by Human Genome Project and 1000 Genomes Project, with oversight bodies interacting with funders such as National Institutes of Health, Wellcome Trust, and international partners in European Union research frameworks. Collaborative outputs were disseminated through conferences hosted by American Society of Human Genetics, Cold Spring Harbor Laboratory meetings, and publications in major journals including Nature, Science (journal), and Cell (journal).

The Consortium’s claims about the proportion of the genome with biochemical activity sparked debate among researchers from institutions including Harvard University, Massachusetts Institute of Technology, University of Chicago, Broad Institute, and Cold Spring Harbor Laboratory. Critics referenced conceptual frameworks from evolutionary studies associated with University of Oxford, University of California, Berkeley, and Princeton University and invoked measures used in population genetics from groups at Stanford University, Yale University, and University of Michigan. Responses engaged additional consortia such as 1000 Genomes Project and GTEx Consortium and prompted methodological refinements adopted by groups at EMBL-EBI, GENCODE, and UCSC Genome Browser. The debate influenced policy discussions in forums involving National Institutes of Health and funding bodies like Wellcome Trust and led to subsequent analyses reconciling biochemical activity with evolutionary conservation as exemplified by research teams at University of Cambridge, Max Planck Institute, and Salk Institute.