GRCh37 — LLMpedia

GRCh37
Name	GRCh37
Organism	Human (Homo sapiens)
Assembly level	Reference assembly
Release date	2009
Version	GRCh37
Previous	NCBI36
Next	GRCh38

Contents

Background and development
Technical features and assembly methods
Genome annotation and reference coordinates
Major updates and patch releases
Impact on research and clinical genomics
Transition to GRCh38 and legacy use cases

GRCh37.

GRCh37 is a human reference genome assembly produced by the Genome Reference Consortium that served as a central coordinate system for genomic research and clinical interpretation. Developed with contributions from institutions including the Wellcome Trust Sanger Institute, the National Center for Biotechnology Information, the European Bioinformatics Institute, and the Broad Institute, GRCh37 standardized sequence representation used by projects such as the 1000 Genomes Project, the ENCODE Project, and the International HapMap Project. Its adoption influenced variant catalogues like dbSNP, clinical resources such as ClinVar, and commercial platforms from companies including Illumina, Thermo Fisher Scientific, and Roche.

Background and development

GRCh37 was assembled by the Genome Reference Consortium following earlier assemblies like NCBI36 to improve representation of human sequence diversity for users such as the Human Genome Project community, the Personal Genome Project, and clinical groups at institutions including Mayo Clinic and Massachusetts General Hospital. Leadership and funding involved organizations such as the Wellcome Trust, the National Institutes of Health, and the European Molecular Biology Laboratory. The assembly process integrated clone-based resources from BAC clone libraries, sequence data curated by the International Nucleotide Sequence Database Collaboration, and finished sequence produced by centers like the Sanger Centre and the Broad Institute Sequencing Platform.

Technical features and assembly methods

GRCh37 combined sequence from clone-based assemblies, whole-genome shotgun reads, and targeted finishing; contributors included laboratories using platforms by Applied Biosystems, Illumina, and Roche 454. The assembly incorporated improved representations of complex regions found on chromosomes such as Chromosome 1 and Chromosome X, and resolved gaps identified in earlier builds used by projects like HapMap Phase II and the 1000 Genomes Project Phase 1. Computational methods and tools from groups including developers of BLAST, BWA, MAQ, and SAMtools were used for alignment, variant calling, and assembly validation. Quality control leveraged annotations from databases like UniProt, RefSeq, and Ensembl and integrated cytogenetic maps from collections at the American College of Medical Genetics and Genomics and the Human Genome Organisation.

Genome annotation and reference coordinates

Annotation of GRCh37 relied on gene models from resources such as Ensembl, GENCODE, and RefSeq to define coordinates for loci including disease-associated genes catalogued by OMIM and clinically curated entries in ClinVar. Variant databases like dbSNP and population panels from the 1000 Genomes Project and ExAC used GRCh37 coordinates for reporting single-nucleotide polymorphisms and structural variants. Transcript annotations from projects such as GENCODE v19 and protein annotations from UniProtKB/Swiss-Prot were mapped to GRCh37, while pathway and functional databases including KEGG, Reactome, and Gene Ontology used the assembly to link genotype to phenotype across datasets generated at institutions like the Sanger Institute and the Broad Institute.

Major updates and patch releases

Following its initial release, GRCh37 received multiple patch releases and alternate locus scaffolds handled by the Genome Reference Consortium and mirrored by repositories such as the European Nucleotide Archive, the Sequence Read Archive, and GenBank. Patches addressed representation of medically relevant regions including the HLA loci and complex structural regions implicated in syndromes catalogued by Decipher and clinical centers like St. Jude Children’s Research Hospital. The assembly maintained compatibility with tools such as Picard, GATK, and VCFtools while enabling improved mappings used by sequencing consortia including TCGA and the GTEx Project.

Impact on research and clinical genomics

GRCh37 underpinned large-scale efforts including the 1000 Genomes Project, The Cancer Genome Atlas, ENCODE Project, and population studies at centers like Kaiser Permanente and the UK Biobank. Clinical laboratories used GRCh37 coordinates for diagnostic panels, variant interpretation workflows informed by ACMG guidelines, and reporting systems integrated with electronic health record vendors such as Epic Systems and Cerner Corporation. The assembly facilitated development of bioinformatics pipelines at organizations including Broad Institute, Illumina, and academic groups at Harvard Medical School, Stanford University, and University of Cambridge for research into diseases catalogued in ClinVar and COSMIC.

Transition to GRCh38 and legacy use cases

Although superseded by the next release produced by the Genome Reference Consortium, many resources and tools maintained GRCh37 for continuity in longitudinal studies managed by consortia such as GTEx, TCGA, and the 1000 Genomes Project. Conversion tools such as liftOver and pipelines maintained by the UCSC Genome Browser and Ensembl enabled mapping between GRCh37 and newer assemblies like GRCh38, supporting legacy clinical reports from institutions including Mayo Clinic and Johns Hopkins Medicine. Researchers and clinical laboratories often retain GRCh37 for reproducibility of studies published in journals like Nature, Science, and Genome Research, while transitioning workflows are coordinated with standards bodies such as the Global Alliance for Genomics and Health and the Clinical Genome Resource.

Category:Human genome