LLMpediaThe first transparent, open encyclopedia generated by LLMs

dbVar

Generated by GPT-5-mini
Note: This article was automatically generated by a large language model (LLM) from purely parametric knowledge (no retrieval). It may contain inaccuracies or hallucinations. This encyclopedia is part of a research project currently under review.
Article Genealogy
Parent: Ensembl Hop 4
Expansion Funnel Raw 1 → Dedup 0 → NER 0 → Enqueued 0
1. Extracted1
2. After dedup0 (None)
3. After NER0 ()
4. Enqueued0 ()
dbVar
NamedbVar
OwnerNational Center for Biotechnology Information (NCBI)
TypeStructural variation database
Launch2010
CountryUnited States

dbVar

dbVar is a public archive for genomic structural variation hosted by the National Center for Biotechnology Information. It aggregates curated calls of large-scale genomic variants from diverse projects, consortia, and institutions to support research in genomics, clinical genetics, and population biology. The resource interoperates with complementary NCBI resources and international archives to provide standardized access to structural variation descriptions and supporting evidence.

Overview

dbVar serves as an archival repository for structural variants such as deletions, insertions, duplications, inversions, and translocations discovered in human and other species. It interfaces with resources including the National Library of Medicine, the Sequence Read Archive, the Genome Reference Consortium, the International Nucleotide Sequence Database Collaboration, and the European Bioinformatics Institute. dbVar records variant coordinates, variant types, sample provenance, experimental assays, and links to publications from outlets like Nature, Science, The New England Journal of Medicine, and PLOS Genetics. Major contributors include projects and organizations such as the 1000 Genomes Project, the Genome Aggregation Database, the Human Genome Structural Variation Consortium, the Broad Institute, and the Wellcome Sanger Institute.

Data Content and Scope

dbVar contains asserted structural variant calls across species including Homo sapiens, Mus musculus, Drosophila melanogaster, and Arabidopsis thaliana. Records capture variant intervals mapped to assemblies maintained by the Genome Reference Consortium, with cross-references to RefSeq, GenBank, and Ensembl annotations. Data sources include large-scale efforts like the 1000 Genomes Project, gnomAD, the Exome Aggregation Consortium, the Cancer Genome Atlas, and disease-focused consortia such as the ClinGen Structural Variant Working Group and the International Cancer Genome Consortium. Associated metadata reference contributors such as the National Human Genome Research Institute, the Wellcome Trust, the Howard Hughes Medical Institute, and journal articles by authors affiliated with institutions like Harvard University, Stanford University, and the University of Cambridge.

Submission and Curation Processes

Submitters to dbVar typically include academic centers, sequencing cores, clinical reference laboratories, and consortia such as the Global Alliance for Genomics and Health. Submissions follow templates aligned with standards from the International Organization for Standardization, the Global Alliance, and the Human Genome Variation Society nomenclature recommendations. Curatorial review involves validation of coordinates against assemblies from the Genome Reference Consortium, cross-checks with primary data in the Sequence Read Archive and ArrayExpress, and assessment of supporting evidence citing methods from Illumina, Pacific Biosciences, and Oxford Nanopore Technologies. Curators collaborate with stakeholders including ClinVar submitters, the American College of Medical Genetics and Genomics, and journal editors to reconcile variant descriptions and provenance.

Access and Tools

Users access dbVar via the NCBI Entrez system, programmatic E-utilities, and bulk downloads through FTP mirrors shared with the European Bioinformatics Institute. Visualization and query tools integrate with the NCBI Genome Data Viewer, the UCSC Genome Browser, and the Ensembl browser to display variant intervals alongside gene tracks from RefSeq and GENCODE. Analytical tools link to software projects such as BEDTools, SAMtools, BCFtools, and IGV for inspection of supporting read alignments. dbVar also interoperates with knowledgebases like ClinVar, DECIPHER, and the Database of Genomic Variants to facilitate clinical interpretation and research annotation workflows.

Data Standards and Formats

dbVar adopts standardized representations for structural variants using formats including Variant Call Format, BED, and ASN.1-based exchange schemas consistent with GenBank and RefSeq. The resource aligns with nomenclature guidance from the Human Genome Variation Society and with metadata schemas promoted by the Global Alliance for Genomics and Health and the FAIR data principles championed by the European Commission and the Research Data Alliance. Cross-references employ accession systems recognized by the International Nucleotide Sequence Database Collaboration and linkouts to PubMed, CrossRef, and ORCID identifiers for authorship and provenance tracking.

Use Cases and Applications

Researchers leverage dbVar for population genetics analyses using data from the 1000 Genomes Project and gnomAD, for cancer genomics studies integrating The Cancer Genome Atlas and the International Cancer Genome Consortium datasets, and for clinical variant interpretation with support from ClinGen and the American College of Medical Genetics and Genomics guidelines. Conservation biologists and model organism researchers draw on dbVar records to study structural variation in Mus musculus and Drosophila melanogaster, while translational teams at academic medical centers use dbVar-linked evidence alongside laboratory assays to inform diagnostic reporting and variant curation efforts cited in journals such as Genetics in Medicine and Genome Research.

History and Development

dbVar was launched by NCBI as part of a strategy to centralize structural variation data, building on earlier resources like the Database of Genomic Variants and integrating practices from the International Nucleotide Sequence Database Collaboration. Development involved partnerships with the Genome Reference Consortium, the 1000 Genomes Project, the Wellcome Sanger Institute, and commercial sequencing vendors. Over time dbVar expanded support for multiple species, enhanced interoperability with clinicians through ClinVar linkages, and adopted programmatic access via Entrez and E-utilities following models used by PubMed and GenBank. Ongoing development continues in collaboration with international stakeholders including the European Bioinformatics Institute, the Global Alliance for Genomics and Health, and major research universities.

Category:Biological databases