Generated by GPT-5-mini| CpG islands | |
|---|---|
| Name | CpG island |
| Organism | Vertebrates |
| Length | Typically 200–2000 bp |
| Gc content | High |
| Methylation | Hypomethylated in promoters |
| Discovered | 1980s |
CpG islands are short genomic regions with high frequency of the cytosine–phosphate–guanine dinucleotide and elevated GC content that are commonly found near transcription start sites in vertebrate genomes. First characterized in human and mouse genomes, they are associated with promoters of housekeeping genes and are central to studies linking Harvard University-era molecular biology labs, early epigenetics research groups, and genome sequencing consortia. Research on CpG islands intersects with work from institutions such as Cold Spring Harbor Laboratory, Sanger Institute, Broad Institute, National Institutes of Health, and with projects like the Human Genome Project and the ENCODE Project.
CpG islands are defined operationally by thresholds of GC content and observed-to-expected CpG ratio used in landmark studies at institutions including Yale University, University of California, Berkeley, Stanford University, Massachusetts Institute of Technology, and Max Planck Society. Classic criteria (from early publications affiliated with Imperial College London and University of Cambridge) specify regions typically >200 bp with GC content >50% and an observed/expected CpG ratio >0.6, though alternative definitions arose from analyses by groups at University of Oxford, Johns Hopkins University, University of Tokyo, and Karolinska Institutet. Structurally, CpG-rich regions are enriched for unmethylated cytosines in promoters of genes studied in laboratories such as University of Chicago and University of Michigan, and they contain binding motifs for transcription factors characterized by teams at Cold Spring Harbor Laboratory and EMBL-EBI.
CpG islands are nonrandomly distributed across vertebrate chromosomes sequenced by consortia including the Human Genome Project, the Mouse Genome Sequencing Consortium, and comparative efforts involving Genome Canada and Baylor College of Medicine. They cluster near transcription start sites of genes cataloged in databases from NCBI, Ensembl, and UCSC Genome Browser projects, and are often associated with promoters of housekeeping genes annotated by groups at Harvard Medical School and University College London. Identification methods developed by researchers at University of California, San Francisco, Washington University in St. Louis, and NIH use sliding-window algorithms, CpG ratio metrics, and machine-learning classifiers, with implementations in tools originating from European Bioinformatics Institute, Sanger Institute, and academic software groups in Germany and France.
CpG-rich promoter regions function in transcriptional initiation and chromatin organization, as elucidated by work from labs at Rockefeller University, Columbia University, University of Pennsylvania, and Duke University. They serve as platforms for binding of transcription factors and chromatin modifiers characterized by teams at Princeton University, MIT, and Cornell University. DNA methylation status at these regions is dynamically regulated by enzyme families discovered and studied in institutions such as Institut Pasteur, Max Delbrück Center, and University of California, San Diego; these include DNA methyltransferases linked to studies at University of Cambridge and demethylation pathways investigated by researchers at University of Freiburg. Chromatin remodelers and histone modification pathways interacting with CpG-rich regions were characterized in projects at European Molecular Biology Laboratory and labs affiliated with ETH Zurich.
Aberrant methylation of CpG-rich promoter regions is implicated in gene silencing events underlying cancers investigated at MD Anderson Cancer Center, Dana-Farber Cancer Institute, Johns Hopkins Hospital, and Mayo Clinic. Epigenetic dysregulation involving these regions features in studies of imprinting disorders explored at Baylor College of Medicine and neurodevelopmental conditions researched at Massachusetts General Hospital and Great Ormond Street Hospital. Clinical epigenomics initiatives at National Cancer Institute and translational research programs at Stanford Medicine and UCSF Medical Center investigate how hypermethylation and hypomethylation of CpG-rich loci affect prognosis, therapeutic response, and biomarker development, linking to consortia such as The Cancer Genome Atlas.
Experimental techniques to profile methylation and occupancy at CpG-rich regions derive from methodologies developed at Cold Spring Harbor Laboratory, Broad Institute, and Sanger Institute: bisulfite sequencing, methylated DNA immunoprecipitation, reduced representation bisulfite sequencing, and single-molecule long-read approaches pioneered with technology partners including Pacific Biosciences and Oxford Nanopore Technologies. Chromatin and transcription factor mapping via ChIP-seq and ATAC-seq protocols standardized by groups at ENCODE Project and Roadmap Epigenomics Consortium reveal functional states of CpG-rich promoters. Computational prediction and annotation pipelines were created by bioinformatics teams at European Bioinformatics Institute, University of California, Santa Cruz, Carnegie Mellon University, and industry groups at Illumina and employ algorithms drawing on machine learning methods advanced at Google DeepMind and university labs.
Comparative genomics analyses across vertebrates, including mammals sequenced by the Genome 10K Project, primate efforts coordinated by institutions like Wellcome Trust-funded centers and avian genome projects led by Smithsonian Institution, show varying conservation of CpG-rich regions. Evolutionary studies by researchers at University of Copenhagen, University of Helsinki, and University of Barcelona connect changes in CpG density to mutation processes influenced by methylation, drawing on paleogenomics data curated by teams at Max Planck Institute for Evolutionary Anthropology and the Natural History Museum, London. Cross-species comparisons using resources from Ensembl Genomes and the UCSC Genome Browser illuminate lineage-specific retention or loss of CpG-rich regulatory elements.