ENCODE Pilot Project

ENCODE Pilot Project
Name	ENCODE Pilot Project
Founded	2003
Field	Genomics, Molecular Biology
Coordinator	National Human Genome Research Institute
Country	United States

Contents

ENCODE Pilot Project

The ENCODE Pilot Project was an early phase of a large-scale initiative to map functional elements in the human genome, launched under the auspices of the National Human Genome Research Institute and conducted with participation from institutions such as the Wellcome Trust Sanger Institute, Broad Institute, University of California, Berkeley, Cold Spring Harbor Laboratory and European Molecular Biology Laboratory. The pilot phase brought together investigators who had worked on the Human Genome Project, International Human Genome Sequencing Consortium, Genome Research groups, and consortia associated with National Institutes of Health funding to develop assays and standards for functional annotation. The effort preceded and informed later stages involving projects like the Roadmap Epigenomics Project and international collaborations with groups including the European Bioinformatics Institute and the Canadian Institutes of Health Research.

Background and Objectives

The Pilot Project aimed to test experimental, computational, and organizational approaches for comprehensive annotation by targeting ~1% of the Homo sapiens reference genome, coordinating expertise from the National Human Genome Research Institute, the Wellcome Trust, the Howard Hughes Medical Institute, the Broad Institute, and the Cold Spring Harbor Laboratory. Objectives included identifying transcriptional units by methods developed in laboratories led by researchers affiliated with the National Institutes of Health, mapping chromatin marks using protocols from teams at the European Molecular Biology Laboratory and the Wellcome Trust Sanger Institute, and developing standards for data-sharing consistent with practices at the International Nucleotide Sequence Database Collaboration and the GenBank framework. The Pilot Project sought to bridge technologies from groups at the Massachusetts Institute of Technology, the University of California, San Diego, and the Stanford University computational biology communities.

Consortia from centers including the Broad Institute, Wellcome Trust Sanger Institute, European Molecular Biology Laboratory, University of Washington, and the Whitehead Institute implemented assays such as chromatin immunoprecipitation followed by microarray or sequencing adapted from methods refined at the Cold Spring Harbor Laboratory and the Genome Institute at Washington University. The design selected 44 discrete genomic regions representing medically and evolutionarily relevant loci studied in cohorts connected to the National Cancer Institute, the Howard Hughes Medical Institute, and academic groups at the University of California, San Francisco and the Johns Hopkins University. Experimental approaches combined RNA profiling strategies from labs at the Salk Institute and the Broad Institute with computational pipelines developed by teams associated with the European Bioinformatics Institute and the Rosalind Franklin Institute. Quality control and data standards were harmonized with practices from the International HapMap Project and the 1000 Genomes Project.

The Pilot Project reported extensive evidence for transcription, chromatin modification, and regulatory signatures across targeted regions, demonstrating that functional elements extended beyond protein-coding sequences—findings resonant with work from the Human Genome Project and subsequent studies by the Roadmap Epigenomics Project, the 1000 Genomes Project, and the Genotype-Tissue Expression consortium. Results showed widespread RNA transcription detected by methods used in labs at the Cold Spring Harbor Laboratory, the Salk Institute, and the Broad Institute, while chromatin marks mapped by teams at the Wellcome Trust Sanger Institute and the European Molecular Biology Laboratory revealed promoters and enhancers comparable to annotations from the RefSeq and Ensembl projects. The Pilot Project also advanced computational motif discovery and regulatory network inference techniques influenced by algorithms from groups at the Massachusetts Institute of Technology, the Stanford University, and the University of California, Berkeley.

Data generated by the Pilot Project were deposited into public repositories coordinated by the National Center for Biotechnology Information, the European Bioinformatics Institute, and the DNA Data Bank of Japan, following precedents set by the International Human Genome Sequencing Consortium and the International Nucleotide Sequence Database Collaboration. Resources included mapped ChIP-seq, RNA-seq, DNase I hypersensitivity data and associated metadata standardized with input from the National Institutes of Health data-sharing policies and curated in databases used by researchers at the Broad Institute, the Wellcome Trust Sanger Institute, and the European Molecular Biology Laboratory. The project produced software pipelines and quality metrics adopted by projects such as the ENCODE Project Consortium successor phases, the Roadmap Epigenomics Project, and clinical sequencing efforts at the Mayo Clinic and the National Cancer Institute.

The Pilot Project influenced experimental standards used by large-scale efforts including the ENCODE Project Consortium main phase, the Roadmap Epigenomics Project, the Genotype-Tissue Expression project, and translational initiatives at the National Cancer Institute and the Wellcome Trust. It shaped data-sharing models practiced by the International Human Genome Sequencing Consortium, informed policy discussions at the National Institutes of Health, and contributed methods widely used in laboratories at the Broad Institute, the Salk Institute, and the European Molecular Biology Laboratory. The legacy includes methodological advances incorporated into pipelines at the Genome Institute at Washington University, educational materials at the Cold Spring Harbor Laboratory, and standards adopted by the European Bioinformatics Institute.

Critics from institutions such as the Wellcome Trust, the National Institutes of Health, and academic groups at the Harvard University and the University of Cambridge debated interpretations of "function" used by the project, citing differing perspectives from evolutionary analyses by researchers affiliated with the Max Planck Society and genomic annotation standards championed by the RefSeq and Ensembl communities. Controversies included discussions involving the National Human Genome Research Institute and commentators connected to the Howard Hughes Medical Institute over the thresholds for biochemical activity versus evolutionary constraint, echoing debates in forums involving the European Molecular Biology Laboratory and the Wellcome Trust Sanger Institute. The debates spurred methodological refinements and follow-up studies from groups at the Broad Institute, the Salk Institute, and the University of California, Berkeley.