LLMpediaThe first transparent, open encyclopedia generated by LLMs

dbSNP

Generated by GPT-5-mini
Note: This article was automatically generated by a large language model (LLM) from purely parametric knowledge (no retrieval). It may contain inaccuracies or hallucinations. This encyclopedia is part of a research project currently under review.
Article Genealogy
Expansion Funnel Raw 62 → Dedup 0 → NER 0 → Enqueued 0
1. Extracted62
2. After dedup0 (None)
3. After NER0 ()
4. Enqueued0 ()
dbSNP
NamedbSNP
TitledbSNP
ProducerNational Center for Biotechnology Information
CountryUnited States
Year1998
DisciplinesGenetics; Genomics; Molecular biology

dbSNP is a public database that catalogs genetic variation among and within species, emphasizing single nucleotide polymorphisms and short insertion–deletion events. Developed and maintained by the National Center for Biotechnology Information within the National Library of Medicine, dbSNP aggregates submissions from large projects and individual investigators to support research in human genetics, agriculture, evolutionary biology, and medicine. The resource interoperates with major biomedical resources and consortia to enable variant annotation, population genetics, and clinical research.

History

dbSNP was established in 1998 under the auspices of the National Institutes of Health and the National Human Genome Research Institute during the era following publication of the Human Genome Project draft. Early contributors included large sequencing centers such as the Wellcome Trust Sanger Institute, the Broad Institute, and the European Bioinformatics Institute. dbSNP rapidly integrated data from projects like the 1000 Genomes Project, the HapMap Project, and the International HapMap Project consortium, while collaborating with clinical initiatives such as the Genome-wide Association Study consortia and population studies like the UK Biobank. Over time dbSNP expanded to incorporate variant submissions from researchers affiliated with institutions such as Harvard University, Massachusetts Institute of Technology, Stanford University, University of California, Berkeley, and Johns Hopkins University, and to interlink with resources maintained by organizations including the World Health Organization, the European Union, and the National Cancer Institute.

Scope and content

dbSNP focuses on single nucleotide polymorphisms (SNPs), short insertions and deletions (indels), and small microsatellite variants across many taxa, including human, mouse, and major agricultural species studied at institutions like the United States Department of Agriculture and the Food and Agriculture Organization. The database stores variant-level records submitted by projects such as the 1000 Genomes Project, clinical sequencing efforts at centers like Mayo Clinic and Cleveland Clinic, and population cohorts including Framingham Heart Study and the Icelandic deCODE genetics initiative. dbSNP entries cross-reference sequence records from resources maintained by groups like the European Molecular Biology Laboratory and the GenBank repository at the National Center for Biotechnology Information. Integration extends to annotation efforts by the Ensembl project, the UCSC Genome Browser, and variant interpretation frameworks used by organizations such as the American College of Medical Genetics and Genomics.

Data model and identifiers

dbSNP assigns stable identifiers to variant records and integrates them with sequence accessions from databases created by the National Center for Biotechnology Information and linked resources like RefSeq and UniProt. Each record links submitter information from institutions such as the Wellcome Trust Sanger Institute or the Broad Institute and aligns variant coordinates to reference assemblies maintained by the Genome Reference Consortium. Data fields capture allelic context, observed frequencies from projects like the 1000 Genomes Project and the Exome Aggregation Consortium, and clinical annotations that reference standards promoted by bodies such as the American Medical Association and the Clinical Genome Resource. Identifiers support cross-database citation in scholarly literature published in journals like Nature, Science, and The New England Journal of Medicine.

Submission and curation process

Researchers and sequencing centers submit variant data to dbSNP through pipelines developed by the National Center for Biotechnology Information and partner institutions including the European Bioinformatics Institute and the Wellcome Trust Sanger Institute. Large-scale projects such as the 1000 Genomes Project, clinical laboratories affiliated with Mayo Clinic and university hospitals like Massachusetts General Hospital, and consortia including the International Cancer Genome Consortium provide batch submissions. Curatorial staff and automated validation systems check format, mapping to reference assemblies produced by the Genome Reference Consortium, and basic allele consistency; flagged records may be reviewed by expert groups associated with organizations like the American College of Medical Genetics and Genomics or the Clinical Genome Resource. Submitter metadata often cites funding from agencies such as the National Science Foundation and the Wellcome Trust.

Access and tools

dbSNP data are accessible via the National Center for Biotechnology Information Entrez system, programmatic APIs maintained by the National Center for Biotechnology Information, and visualization through browsers like the UCSC Genome Browser and Ensembl. Tools for variant querying and batch retrieval are integrated with analysis platforms used by researchers at institutions such as Broad Institute, Harvard Medical School, Stanford Medicine, and Cold Spring Harbor Laboratory. Large-scale users draw on cloud-based resources from providers like Amazon Web Services and collaborations with projects such as the Genomic Data Commons and the Global Alliance for Genomics and Health. Educational and community outreach touches organizations including the American Society of Human Genetics and the European Society of Human Genetics.

Applications and impact

dbSNP underpins genome-wide association studies (GWAS) conducted by consortia like the Wellcome Trust Case Control Consortium and clinical variant interpretation in contexts such as cancer genomics led by the National Cancer Institute and rare-disease programs at centers like Children's Hospital of Philadelphia. Agricultural genetics research at institutions like the United States Department of Agriculture and CIMMYT leverages dbSNP for marker development and breeding. Evolutionary studies referencing populations sampled by the 1000 Genomes Project and deCODE genetics use dbSNP to infer population structure and selection. Public health initiatives drawing on genomic surveillance by agencies such as the Centers for Disease Control and Prevention have utilized dbSNP-referenced variants for pathogen and host studies. Scholarly output citing dbSNP appears across journals and conferences organized by groups like Nature Genetics, American Society of Human Genetics, and European Society of Human Genetics.

Limitations and controversies

dbSNP has faced limitations related to representativeness of population sampling—criticisms voiced by researchers at institutions including Harvard University, University of Oxford, and Wellcome Trust Sanger Institute—and issues in merging submissions that led to identifier multiplicity and redundancy debated in forums hosted by the National Institutes of Health and the Global Alliance for Genomics and Health. Concerns about clinical misinterpretation of common versus pathogenic variants have engaged stakeholders such as the American College of Medical Genetics and Genomics and clinical laboratory networks at institutions like Mayo Clinic and Johns Hopkins Hospital. Data privacy and consent debates involving cohorts like the Icelandic deCODE genetics project and biobanks such as the UK Biobank have influenced submission policies coordinated by the National Library of Medicine and international partners. Ongoing efforts by the National Center for Biotechnology Information, the Genome Reference Consortium, and the Global Alliance for Genomics and Health aim to address data quality, provenance, and equitable representation.

Category:Biological databases