Database of Genotypes and Phenotypes

Database of Genotypes and Phenotypes
Name	Database of Genotypes and Phenotypes
Acronym	dbGaP
Established	2000
Scope	Genotype–phenotype association data
Maintained by	National Center for Biotechnology Information
Country	United States

Contents

Database of Genotypes and Phenotypes

The Database of Genotypes and Phenotypes is a federally supported repository for human genotype and phenotype data that facilitates large-scale genetic research and biomedical investigations. It was developed within the National Institutes of Health framework and interacts with programs and institutions such as the National Human Genome Research Institute, the National Library of Medicine, the Wellcome Trust, the European Bioinformatics Institute, the Broad Institute, and major clinical cohorts like the Framingham Heart Study, the UK Biobank, and the Alzheimer's Disease Neuroimaging Initiative.

Introduction

dbGaP was launched amid initiatives including the Human Genome Project, the International HapMap Project, and policy shifts from the White House and the United States Congress regarding genomic data sharing. Key milestones involved collaborations with entities such as the National Institutes of Health, the Department of Health and Human Services, the Food and Drug Administration, and research centers like the Mayo Clinic and Johns Hopkins University. Over time, datasets from projects led by investigators at the University of California, San Francisco, the University of Pennsylvania, and the University of Michigan were integrated, with governance informed by reports from panels convened by the Institute of Medicine and advisory boards including members from the Wellcome Trust and the European Commission.

dbGaP stores genotype arrays, whole-exome and whole-genome sequence data, phenotype questionnaires, clinical measurements, and pedigree files contributed by studies such as the Framingham Heart Study, the Alzheimer's Disease Neuroimaging Initiative, the Cancer Genome Atlas, and the ClinSeq project. Data organization follows metadata standards developed in consultation with groups like the Global Alliance for Genomics and Health, the European Bioinformatics Institute, and the National Center for Biotechnology Information. Schema linkages reference controlled vocabularies and ontologies used by the National Library of Medicine and interoperable resources such as the Gene Ontology initiative and the Human Phenotype Ontology.

Access mechanisms were defined by policy documents from the National Institutes of Health, oversight from the National Institutes of Health Office of Science Policy, and legal frameworks influenced by the Health Insurance Portability and Accountability Act and guidance from the Office for Human Research Protections. Data access committees composed of representatives from the National Human Genome Research Institute, the National Cancer Institute, and external experts review applications from investigators at institutions such as Columbia University, Yale University, and the University of California, Los Angeles. Submission processes align with standards advocated by the Global Alliance for Genomics and Health, the Wellcome Trust, and major funders like the Bill & Melinda Gates Foundation and the Howard Hughes Medical Institute.

dbGaP policy development responded to ethical considerations raised by cases involving researchers at institutions such as the University of South Carolina, the University of Washington, and panels convened by the Institute of Medicine and the National Bioethics Advisory Commission. Consent frameworks reflect models used in studies like the Nurses' Health Study, the Framingham Heart Study, and the Women’s Health Initiative, and compliance measures reference statutes and guidance from the Department of Health and Human Services and the Office for Human Research Protections. Debates involving privacy and re-identification risks engaged stakeholders from the Broad Institute, the Sanger Institute, the European Commission, and advocacy groups including the American Civil Liberties Union.

Research enabled by dbGaP has supported discoveries reported by teams at the Broad Institute, the Wellcome Sanger Institute, Harvard Medical School, and Stanford University in areas spanning cardiovascular disease, cancer genomics, neurodegeneration, and psychiatric genetics. Analyses combining dbGaP datasets with resources such as the UK Biobank, the 1000 Genomes Project, and the Cancer Genome Atlas informed publications involving consortia like the Psychiatric Genomics Consortium, the International Cancer Genome Consortium, and collaborative networks including the Alzheimer's Disease Sequencing Project.

Critiques of dbGaP cite issues also discussed in contexts involving the Human Genome Project, the International HapMap Project, and repositories like the European Genome-phenome Archive concerning consent scope, data heterogeneity, and access restrictions. Scholars and advocacy organizations including the American Civil Liberties Union, ethicists associated with the Institute of Medicine, and policy analysts from the National Academies of Sciences, Engineering, and Medicine have raised questions about data reuse, transparency for participants in cohorts like the Framingham Heart Study and the Nurses' Health Study, and the balance between open science promoted by the Wellcome Trust and privacy protections overseen by the Department of Health and Human Services.

Category:Genomics databases