International Genome Sample Resource

International Genome Sample Resource
Name	International Genome Sample Resource
Abbreviation	IGSR
Formation	2016
Type	Biological database
Headquarters	Hinxton, Cambridgeshire
Parent organization	European Bioinformatics Institute

Contents

History and development
Data content and resources
Access policies and ethical considerations
Infrastructure and data standards
Collaborations and funding
Applications and impact

International Genome Sample Resource

The International Genome Sample Resource is a public genomic data repository that distributes reference human variation data derived from global sequencing projects. It serves as a successor to large-scale initiatives that generated population-scale genotype and sequence datasets and supports biomedical research, population genetics, and computational genomics. The resource links legacy datasets with contemporary sequencing efforts and collaborates with major institutes, consortia, and biobanks to maintain standardized, openly accessible variant callsets.

History and development

The resource emerged after the completion of the pilot phases of the 1000 Genomes Project and the transition of datasets hosted by the Wellcome Sanger Institute and the European Bioinformatics Institute. Early milestones included integration of phase releases and harmonization with outputs from the HapMap Project, the Human Genome Project, and the International HapMap Consortium. Leadership and advisory input drew on expertise from groups at the Broad Institute, the National Human Genome Research Institute, and the Max Planck Society. Strategic developments were influenced by policy discussions at forums such as the Global Alliance for Genomics and Health, meetings at the Cold Spring Harbor Laboratory, and workshops sponsored by the Wellcome Trust. Over time, the resource incorporated improvements in sequencing technology pioneered by teams at Illumina, Oxford Nanopore Technologies, and research labs at Stanford University and Harvard Medical School.

Data content and resources

The repository curates variant callsets, phased haplotypes, population allele frequencies, and linkage disequilibrium maps derived from whole-genome and exome sequencing projects. Core datasets trace back to sample collections associated with the 1000 Genomes Project populations and reference panels used by the Haplotype Reference Consortium and the International Genome Sample Resource’s predecessor archives. Complementary resources include alignments against the GRCh38 assembly, annotations linked to the Ensembl database, and cross-references with the UCSC Genome Browser. The resource provides sample metadata, population labels aligned with standards used by the Human Genome Diversity Project and population descriptors employed in studies from the National Institutes of Health and the European Commission research programs.

Access policies and ethical considerations

Access and usage policies reflect norms established by the Human Fertilisation and Embryology Authority and guidelines from the World Health Organization and the Global Alliance for Genomics and Health. Data access distinguishes open-access variant frequency data from controlled-access individual-level genotypes governed by data access committees, institutional review boards at the University of Cambridge, and consent frameworks modeled on the European Genome-phenome Archive and the dbGaP system administered by the National Center for Biotechnology Information. Ethical considerations reference principles discussed at the Council of Europe and legislation such as the General Data Protection Regulation enacted by the European Union and national bioethics laws in countries represented in the sample panels. Community engagement and benefit-sharing dialogues have involved stakeholders from indigenous groups, academic consortia like the African Genome Variation Project, and advocacy organizations including the Global Genes network.

Infrastructure and data standards

Computational infrastructure leverages cloud and high-performance computing platforms used by the European Molecular Biology Laboratory and the European Bioinformatics Institute, with mirror sites coordinated alongside the National Center for Biotechnology Information and the DNA Data Bank of Japan. Data formats adhere to standards promulgated by the Global Alliance for Genomics and Health such as Variant Call Format and use coordinate systems compatible with the Genome Reference Consortium. Metadata schemas align with ontologies and controlled vocabularies from the Gene Ontology consortium and exchange protocols similar to those implemented by the Ensembl and UCSC Genome Browser projects. Versioning, provenance, and reproducibility are supported through pipelines and workflow standards advocated by the Broad Institute’s Terra platform and workflow languages endorsed by the Open Bioinformatics Foundation.

Collaborations and funding

The resource is supported through collaborations among international institutes including the Wellcome Trust Sanger Institute, the European Bioinformatics Institute, the Broad Institute, and national funding agencies such as the Wellcome Trust, the National Institutes of Health, and the European Commission. Partnerships extend to biobanks and consortia like the UK Biobank, the 1000 Genomes Project partners, and regional initiatives including the African Genome Variation Project and the GenomeAsia 100K Consortium. Funding sources have included grants and programmatic support from the Wellcome Trust, UK Research and Innovation, and cross-border research infrastructure investments coordinated through agencies such as the European Research Council.

Applications and impact

The resource underpins diverse applications in human genetics research, clinical variant interpretation pipelines used by clinical genetics services at institutions like Mayo Clinic and Johns Hopkins Hospital, pharmacogenomics investigations with collaborators at Pfizer and Novartis, and population structure analyses informing studies published by teams at Harvard Medical School and the University of Oxford. It has been cited in studies of complex trait genetics, rare disease variant discovery, and methods development in statistical genetics at the Broad Institute. The availability of harmonized reference panels has accelerated imputation efforts used in genome-wide association studies by consortia such as the International HapMap Project collaborators and disease-specific groups like the Psychiatric Genomics Consortium. The resource continues to influence standards for data sharing and reproducibility in genomics research across global research centers including the Sanger Institute and the European Molecular Biology Laboratory.

Category:Genetics databases