JASPAR database — LLMpedia

JASPAR database
Name	JASPAR
Discipline	Bioinformatics
Subject	Transcription factor binding profiles
Country	International
Providers	University of Turku; European Bioinformatics Institute
Released	2004

Contents

JASPAR database

JASPAR is an open-access biological repository of transcription factor DNA-binding preferences that supports research in genomics, molecular biology, and computational biology. It supplies curated position frequency matrices and profile models used across projects in comparative genomics, functional genomics, and systems biology involving institutions such as European Molecular Biology Laboratory, Wellcome Trust Sanger Institute, National Institutes of Health, Howard Hughes Medical Institute, and Max Planck Society. The resource interfaces with tools developed by teams from European Bioinformatics Institute, University of Cambridge, Stanford University, Massachusetts Institute of Technology, and University of California, Berkeley.

Introduction

JASPAR provides curated, non-redundant collections of transcription factor binding profiles for organisms spanning metazoans, plants, fungi, and bacteria, integrating evidence from experiments by groups at Harvard University, Yale University, Columbia University, University of Oxford, and McGill University. The database uses consensus models and position weight matrices employed by software from laboratories at European Bioinformatics Institute, Cold Spring Harbor Laboratory, Broad Institute, University of California, San Diego, and Carnegie Mellon University to enable motif scanning in genome assemblies produced by projects like Human Genome Project, 1000 Genomes Project, ENCODE Project, Genome Reference Consortium, and UCSC Genome Browser.

JASPAR originated in the early 2000s with contributors linked to Turku Centre for Biotechnology, Institute for Molecular Medicine Finland, and collaborators at European Molecular Biology Laboratory. Foundational releases coincided with landmark datasets from Saccharomyces Genome Database, Drosophila Genome Project, Arabidopsis Information Resource, and sequencing centers including Sanger Institute and Genome Institute at Washington University. Over iterations the project collaborated with consortia such as ENCODE Project Consortium, modENCODE, GTEx Consortium, and infrastructure partners like EMBL-EBI, NCBI, and UniProt Consortium to expand taxonomic breadth and modeling approaches.

Users retrieve profiles through web interfaces and programmatic APIs implemented alongside tools from Bioconductor, Galaxy Project, UCSC Genome Browser, IGV (Integrative Genomics Viewer), and MEME Suite which have development groups at University of California, Santa Cruz, European Bioinformatics Institute, Boston University, Baylor College of Medicine, and University of Pennsylvania. JASPAR-compatible libraries and wrappers exist in languages and environments maintained by teams at RStudio, Python Software Foundation, Apache Software Foundation, GitHub, and SourceForge. Integration pipelines tie into resources curated by ArrayExpress, BioGRID, STRING Consortium, Reactome, and KEGG.

Researchers at Broad Institute, Salk Institute, Weizmann Institute of Science, University of Toronto, and Monash University utilize the profiles for motif scanning to interpret regulatory variants reported by projects such as ClinVar, dbSNP, ExAC, and gnomAD. Studies in developmental biology reference motifs when analyzing data from model organism centers like Waksman Institute, ZFIN, FlyBase, and WormBase; translational studies at Mayo Clinic, Cleveland Clinic, Memorial Sloan Kettering Cancer Center, and Dana-Farber Cancer Institute use the resource to predict transcription factor perturbations implicated in disease phenotypes cataloged by OMIM, Human Phenotype Ontology, and Cancer Genome Atlas. Conservation and comparative genomics efforts by Sanger Institute, Leibniz Institute DSMZ, Phytozome, and Ensembl Genomes employ JASPAR matrices to annotate regulatory regions across vertebrates, plants, and fungi.

Curation is community-driven with submissions and reviews by researchers affiliated with University of Helsinki, University of Turku, University of Edinburgh, University of Manchester, and ETH Zurich, coordinated with editorial practices akin to those at PubMed Central and CrossRef. The governance model encourages contributions from principal investigators and consortia including ENCODE Project Consortium, FAANG Consortium, International Human Epigenome Consortium, and laboratory groups led by investigators at NIH, Wellcome Trust, and European Research Council. Quality control workflows reference standards used by UniProt Consortium curators and annotation pipelines at EMBL-EBI.

The resource is distributed under open licenses promoting reuse aligned with policies advocated by Creative Commons, Open Science Framework, Wellcome Trust, European Commission Horizon 2020, and funding agencies such as National Science Foundation, European Research Council, and National Institutes of Health. Data access is free for academic and industrial groups including biotech startups incubated by Cambridge Innovation Center, Y Combinator, and translational units at Genentech and Amgen. The project’s sustainability is supported by grants from organizations like Wellcome Trust, European Molecular Biology Laboratory, Academy of Finland, and national research councils such as UK Research and Innovation and National Natural Science Foundation of China.

Category:Biological databases