LLMpediaThe first transparent, open encyclopedia generated by LLMs

JASPAR database

Generated by GPT-5-mini
Note: This article was automatically generated by a large language model (LLM) from purely parametric knowledge (no retrieval). It may contain inaccuracies or hallucinations. This encyclopedia is part of a research project currently under review.
Article Genealogy
Parent: AAVP7A1 Hop 4
Expansion Funnel Raw 121 → Dedup 0 → NER 0 → Enqueued 0
1. Extracted121
2. After dedup0 (None)
3. After NER0 ()
4. Enqueued0 ()
JASPAR database
NameJASPAR
DisciplineBioinformatics
SubjectTranscription factor binding profiles
CountryInternational
ProvidersUniversity of Turku; European Bioinformatics Institute
Released2004

JASPAR database

JASPAR is an open-access biological repository of transcription factor DNA-binding preferences that supports research in genomics, molecular biology, and computational biology. It supplies curated position frequency matrices and profile models used across projects in comparative genomics, functional genomics, and systems biology involving institutions such as European Molecular Biology Laboratory, Wellcome Trust Sanger Institute, National Institutes of Health, Howard Hughes Medical Institute, and Max Planck Society. The resource interfaces with tools developed by teams from European Bioinformatics Institute, University of Cambridge, Stanford University, Massachusetts Institute of Technology, and University of California, Berkeley.

Introduction

JASPAR provides curated, non-redundant collections of transcription factor binding profiles for organisms spanning metazoans, plants, fungi, and bacteria, integrating evidence from experiments by groups at Harvard University, Yale University, Columbia University, University of Oxford, and McGill University. The database uses consensus models and position weight matrices employed by software from laboratories at European Bioinformatics Institute, Cold Spring Harbor Laboratory, Broad Institute, University of California, San Diego, and Carnegie Mellon University to enable motif scanning in genome assemblies produced by projects like Human Genome Project, 1000 Genomes Project, ENCODE Project, Genome Reference Consortium, and UCSC Genome Browser.

History and Development

JASPAR originated in the early 2000s with contributors linked to Turku Centre for Biotechnology, Institute for Molecular Medicine Finland, and collaborators at European Molecular Biology Laboratory. Foundational releases coincided with landmark datasets from Saccharomyces Genome Database, Drosophila Genome Project, Arabidopsis Information Resource, and sequencing centers including Sanger Institute and Genome Institute at Washington University. Over iterations the project collaborated with consortia such as ENCODE Project Consortium, modENCODE, GTEx Consortium, and infrastructure partners like EMBL-EBI, NCBI, and UniProt Consortium to expand taxonomic breadth and modeling approaches.

Content and Data Model

The database stores transcription factor binding profiles as matrices and models derived from experimental assays conducted at labs such as Cold Spring Harbor Laboratory, Max Planck Institute for Molecular Genetics, Rockefeller University, Johns Hopkins University, and Imperial College London. It links provenance to publications in journals associated with Nature Publishing Group, Cell Press, Science, Proceedings of the National Academy of Sciences, and Genome Research where experimentalists from Stanford University School of Medicine, University College London, Karolinska Institutet, ETH Zurich, and University of Tokyo report chromatin immunoprecipitation, SELEX, PBM, and DNase footprinting results. The schema accommodates metadata standards used by Gene Ontology Consortium, Sequence Ontology, MIAME, and identifiers from RefSeq, UniProt, Ensembl, and Gene Expression Omnibus.

Access and Tools

Users retrieve profiles through web interfaces and programmatic APIs implemented alongside tools from Bioconductor, Galaxy Project, UCSC Genome Browser, IGV (Integrative Genomics Viewer), and MEME Suite which have development groups at University of California, Santa Cruz, European Bioinformatics Institute, Boston University, Baylor College of Medicine, and University of Pennsylvania. JASPAR-compatible libraries and wrappers exist in languages and environments maintained by teams at RStudio, Python Software Foundation, Apache Software Foundation, GitHub, and SourceForge. Integration pipelines tie into resources curated by ArrayExpress, BioGRID, STRING Consortium, Reactome, and KEGG.

Applications and Use Cases

Researchers at Broad Institute, Salk Institute, Weizmann Institute of Science, University of Toronto, and Monash University utilize the profiles for motif scanning to interpret regulatory variants reported by projects such as ClinVar, dbSNP, ExAC, and gnomAD. Studies in developmental biology reference motifs when analyzing data from model organism centers like Waksman Institute, ZFIN, FlyBase, and WormBase; translational studies at Mayo Clinic, Cleveland Clinic, Memorial Sloan Kettering Cancer Center, and Dana-Farber Cancer Institute use the resource to predict transcription factor perturbations implicated in disease phenotypes cataloged by OMIM, Human Phenotype Ontology, and Cancer Genome Atlas. Conservation and comparative genomics efforts by Sanger Institute, Leibniz Institute DSMZ, Phytozome, and Ensembl Genomes employ JASPAR matrices to annotate regulatory regions across vertebrates, plants, and fungi.

Community and Curation

Curation is community-driven with submissions and reviews by researchers affiliated with University of Helsinki, University of Turku, University of Edinburgh, University of Manchester, and ETH Zurich, coordinated with editorial practices akin to those at PubMed Central and CrossRef. The governance model encourages contributions from principal investigators and consortia including ENCODE Project Consortium, FAANG Consortium, International Human Epigenome Consortium, and laboratory groups led by investigators at NIH, Wellcome Trust, and European Research Council. Quality control workflows reference standards used by UniProt Consortium curators and annotation pipelines at EMBL-EBI.

Licensing and Availability

The resource is distributed under open licenses promoting reuse aligned with policies advocated by Creative Commons, Open Science Framework, Wellcome Trust, European Commission Horizon 2020, and funding agencies such as National Science Foundation, European Research Council, and National Institutes of Health. Data access is free for academic and industrial groups including biotech startups incubated by Cambridge Innovation Center, Y Combinator, and translational units at Genentech and Amgen. The project’s sustainability is supported by grants from organizations like Wellcome Trust, European Molecular Biology Laboratory, Academy of Finland, and national research councils such as UK Research and Innovation and National Natural Science Foundation of China.

Category:Biological databases