Decipher (database)

Decipher (database)
Name	Decipher
Title	Decipher (database)
Developer	Ocean Informatics
Released	2004
Latest release version	3.8
Programming language	Python, SQL
Operating system	Linux, Windows
Genre	Biological sequence database
License	Proprietary / academic

Contents

Overview
History and development
Data content and scope
Access and querying
Applications and impact
Licensing and governance

Decipher (database) is a curated biological sequence repository and annotation platform designed for microbial pathogen surveillance, antimicrobial resistance analysis, and comparative genomics. The system combines sequence storage, metadata schemas, and query services to support researchers at institutions such as the Wellcome Trust Sanger Institute, Centers for Disease Control and Prevention, and Public Health England. Decipher integrates with laboratory workflows used by organizations like the World Health Organization, European Centre for Disease Prevention and Control, and National Institutes of Health.

Overview

Decipher provides a centralized environment for storing nucleotide sequences, protein annotations, phenotypic metadata, and lineage assignments for bacteria, viruses, and eukaryotic pathogens. The platform supports sequence types employed by projects at the Broad Institute, J. Craig Venter Institute, and Max Planck Institute, and interoperates with data models used by GenBank, European Nucleotide Archive, and DNA Data Bank of Japan. Users can link entries to nomenclature systems such as the International Committee on Taxonomy of Viruses, the International Committee on Systematics of Prokaryotes, and PulseNet. The architecture uses relational schemas and ontologies that mirror standards advanced by the Global Microbial Identifier initiative, Global Alliance for Genomics and Health, and the Research Data Alliance.

History and development

Development began in the early 2000s as part of collaborative efforts between academic groups and public health laboratories, influenced by sequencing initiatives at the Sanger Institute, Broad Institute, and Cold Spring Harbor Laboratory. Early funding and methodology exchanges involved the Wellcome Trust, Howard Hughes Medical Institute, and European Molecular Biology Laboratory. Over successive releases Decipher incorporated algorithms and annotations popularized by BLAST from the National Center for Biotechnology Information, MAFFT from the RIKEN Center, and Prokka from the University of Melbourne. Contributions from consortia such as the Global Outbreak Alert and Response Network and the International Society for Infectious Diseases shaped its surveillance features. Commercial partners and vendors including Illumina, Thermo Fisher Scientific, and Oxford Nanopore Technologies influenced pipeline compatibility.

Data content and scope

The database stores raw reads, assembled contigs, annotated genomes, multilocus sequence typing profiles, plasmid sequences, and antimicrobial resistance determinants. Content types are compatible with tools and databases including CARD, ResFinder, VFDB, ISFinder, and Pfam. Taxonomic coverage spans entries linked to taxa recognized by NCBI Taxonomy, UniProtKB, and SILVA, and includes isolates associated with outbreaks recorded in datasets curated by the Global Influenza Surveillance and Response System, GISAID partners, and PulseNet networks. Metadata fields capture isolate provenance tied to institutions such as the London School of Hygiene & Tropical Medicine, Institut Pasteur, and US Food and Drug Administration, as well as temporal and geographic annotations harmonized with standards from the International Organization for Standardization and World Health Organization.

Access and querying

Decipher supports web-based portals, RESTful APIs, and SQL interfaces that enable programmatic access comparable to services offered by EMBL-EBI, NCBI, and DDBJ. Query mechanisms include sequence similarity search leveraging BLAST+, k-mer search inspired by tools from the Broad Institute, and phylogenetic placement compatible with software like RAxML, IQ-TREE, and BEAST. Users authenticate through federated identity providers used by ORCID, ELIXIR, and eduGAIN to enable role-based access control reflecting policies from agencies such as the UK Health Security Agency and Centers for Disease Control and Prevention. Data export formats follow conventions established by FASTA, FASTQ, GFF, and VCF used in pipelines from Galaxy Project, Nextflow, and Snakemake.

Applications and impact

Decipher is used for outbreak investigation by public health laboratories, antimicrobial resistance surveillance by clinical microbiology units, and evolutionary studies by research groups at universities and institutes such as Harvard University, University of Oxford, and Massachusetts Institute of Technology. It has been integrated into surveillance networks responding to events associated with pathogens studied at the Pasteur Institute, Walter Reed Army Institute of Research, and the Chinese Center for Disease Control and Prevention. Case studies describe its role in tracking transmission chains analyzed alongside methods from the European Society of Clinical Microbiology and Infectious Diseases and the International Virology Congress. The platform supports surveillance programs funded by agencies including the Bill & Melinda Gates Foundation, Wellcome Trust, and National Science Foundation, informing policy decisions by ministries of health and international bodies such as WHO and ECDC.

Licensing and governance

Operational governance involves stakeholder advisory boards drawn from academic institutions, public health agencies, and industry partners like Illumina and Thermo Fisher Scientific. Licensing models combine academic-use agreements and commercial licenses comparable to arrangements used by EMBL-EBI resources and private-sector platforms. Data-sharing policies are aligned with principles from the Bermuda Principles, FAIR Data Principles advocated by the Research Data Alliance, and privacy guidelines established by the General Data Protection Regulation and HIPAA when human-associated metadata are present. Stewardship and sustainability efforts coordinate with consortia such as ELIXIR, the Global Alliance for Genomics and Health, and national bioinformatics centers to ensure long-term availability.

Category:Biological databases