BLAT

BLAT
Name	BLAT
Developer	Jim Kent; University of California, Santa Cruz
Initial release	2000
Operating system	Unix-like, Microsoft Windows
License	Open-source (various)

Contents

History
Purpose and Functionality
Algorithm and Implementation
Usage and Applications
Performance and Limitations
Related Tools and Comparisons

BLAT BLAT is a bioinformatics tool created for rapid sequence alignment and mapping of nucleotide and protein sequences. It was developed to provide fast similarity searches across large genomic assemblies, enabling projects in genomics, transcriptomics, and comparative biology to map sequences to reference genomes and annotated assemblies. The tool occupies a niche between full database search systems and short-read aligners, supporting workflows in laboratories, genome centers, and computational facilities.

History

BLAT was written by Jim Kent while at University of California, Santa Cruz to support annotation efforts for the Human Genome Project and related genome projects. Early use cases included mapping complementary DNA from projects at the National Center for Biotechnology Information and collaborative work with teams at European Bioinformatics Institute and sequencing centers such as Wellcome Sanger Institute and Broad Institute. The design philosophy emphasized speed and modest memory tradeoffs to serve interactive genome browsers like those developed at UCSC Genome Browser and integrate with pipelines from groups associated with National Institutes of Health and academic sequencing consortia.

Subsequent adoption spread through academic labs involved in projects at institutions including Stanford University, Harvard University, Massachusetts Institute of Technology, and international centers such as Max Planck Society and European Molecular Biology Laboratory. Over time, BLAT was compared and contrasted with search systems from National Center for Biotechnology Information and alignment packages developed by teams at Center for Genomic Regulation and commercial vendors. Community forks, ports, and wrappers emerged to support infrastructures at organizations like Ensembl and cloud deployments used by the Genome Reference Consortium.

Purpose and Functionality

BLAT’s core purpose is rapid alignment of query sequences—such as expressed sequence tags, complementary DNAs, mRNAs, or proteins—to large target sequences like chromosomes, contigs, or assembled genomes produced by projects at Joint Genome Institute and sequencing efforts led by Illumina and Pacific Biosciences. It performs both nucleotide-to-nucleotide and translated nucleotide-to-protein searches, enabling mapping of coding sequences from projects at centers like Cold Spring Harbor Laboratory and Salk Institute.

Functionally, BLAT builds an index of non-overlapping k-mers from the target assembly, a strategy employed to reduce query time for users from universities, commercial sequencing groups, and public resources like GenBank. BLAT outputs alignment coordinates, percent identity, and intron/exon structure inference that bioinformaticians at institutions such as Johns Hopkins University and Yale University use to integrate transcript evidence into gene models and annotation portals like those maintained by RefSeq and community annotation efforts.

Algorithm and Implementation

BLAT’s algorithm constructs a hash table of k-mer positions from the target sequence, trading memory for speed to rapidly find candidate alignment seeds, a tactic conceptually related to seeding approaches used in tools developed at European Bioinformatics Institute and academic groups at University of California, Berkeley. For protein searches, BLAT translates nucleotide queries across reading frames and matches against an indexed peptide representation similar in spirit to ideas from alignment research at University of Washington.

Implementation is primarily in C, compiled for Unix and Microsoft Windows environments, and optimized for single-server interactive use in genome browsers like the one at UCSC Genome Browser. The program includes utilities to create two types of databases—one for genomes and another for protein databases—enabling integration with pipelines run on clusters managed by projects at Argonne National Laboratory or cloud infrastructures used by the European Open Science Cloud. The output format supports downstream parsing and visualization by tools developed at institutions such as Broad Institute and community resources like Galaxy Project.

Usage and Applications

BLAT is widely used for tasks including mapping mRNA sequences in transcriptome projects at Allen Institute for Brain Science and aligning expressed sequence tags generated in consortia like 1000 Genomes Project. It assists in identifying exon–intron boundaries for gene model refinement in annotation projects by teams at GENCODE and supports validation of assemblies produced by technologies from Oxford Nanopore Technologies and Pacific Biosciences.

Other applications include cross-species alignments in comparative genomics studies conducted by researchers at Smithsonian Institution and ecological genomics groups at University of California, Davis, as well as rapid identification of genomic loci for cloning projects in labs affiliated with Cold Spring Harbor Laboratory and pharmaceutical research at companies collaborating with National Institutes of Health programs.

Performance and Limitations

BLAT excels in speed for medium-length queries and large target assemblies, offering sub-second responses suitable for interactive use in genome browsers developed at UCSC Genome Browser and visualization platforms used by groups at Ensembl. Its memory usage can be substantial for whole-genome indexes, a consideration in compute environments managed by centers like Lawrence Berkeley National Laboratory.

Limitations include reduced sensitivity for very short reads produced by platforms such as Illumina when compared with specialized short-read aligners from teams at Broad Institute or tools optimized by groups at European Bioinformatics Institute. BLAT also trades exhaustive scoring for speed, making it less appropriate for applications requiring maximal alignment sensitivity used in variant-calling pipelines developed by projects like 1000 Genomes Project and Genome in a Bottle.

BLAT is often compared with search and alignment systems like those provided by National Center for Biotechnology Information (e.g., BLAST) and with short-read aligners such as those from groups at Broad Institute (e.g., BWA) and algorithmic contributions from European Bioinformatics Institute (e.g., Bowtie). Other related tools include aligners optimized for long reads from companies like Oxford Nanopore Technologies and packages developed by academic teams at University of Maryland and University of Tokyo. For genome browser integration, BLAT historically complements services offered by Ensembl and custom aligners used by resource projects such as RefSeq and the Genome Reference Consortium.

Category:Bioinformatics tools

History

Purpose and Functionality

Algorithm and Implementation

Usage and Applications

Performance and Limitations

Related Tools and Comparisons