Lastz — LLMpedia

Lastz
Name	Lastz
Author	James Kent
Developer	University of California, Berkeley
Released	2002
Latest release	1.02.00
Operating system	Unix-like
License	Permissive

Contents

Overview
Methods and Algorithm
Input and Output Formats
Performance and Accuracy
Applications
Development and Availability

Lastz is a bioinformatics pairwise DNA alignment program widely used for aligning long genomic sequences such as chromosomes and contigs. It was developed as a successor to earlier tools for comparative genomics and is optimized for large-scale alignments between eukaryotic genomes, bacterial genomes, and assembled sequence data. The software emphasizes speed, sensitivity, and flexible scoring for handling repeats and rearrangements, and has been integrated into many genomic pipelines and projects.

Overview

Lastz originated as an improvement over alignment tools used in projects like the Human Genome Project, UCSC Genome Browser annotation pipelines, and large-scale comparative efforts between species such as Homo sapiens, Mus musculus, and Drosophila melanogaster. It generalizes the seed-and-extend strategy used by earlier software like BLAST and BLAT, while incorporating ideas from aligners developed for projects including the 1000 Genomes Project and the ENCODE Project. The tool is commonly invoked in workflows alongside resources and tools such as RepeatMasker, MAF processing utilities, and genome browsers maintained by institutions like University of California, Santa Cruz and research groups at Broad Institute.

Methods and Algorithm

Lastz employs a seed-and-extend algorithm with spaced seeds and adaptive scoring matrices; this approach is conceptually related to algorithms implemented in BLAST, FASTA (bioinformatics), and SSW Library. It scans query and target sequences for matching seed words, chains seed hits using techniques akin to those in BLASTZ and MUMmer, and performs gapped dynamic-programming extensions comparable to affine-gap implementations used in software such as Needleman–Wunsch algorithm-based tools and Smith–Waterman algorithm-based libraries. To manage repetitive DNA, Lastz integrates frequency-filtering heuristics similar to those employed by RepeatMasker and uses scoring adjustments to reduce spurious alignments in regions annotated by projects like DGV and databases curated by NCBI. Chaining of ungapped alignments and post-processing for synteny detection borrow concepts from methods used in comparative analyses by groups at Ensembl and sequencing centers such as Wellcome Sanger Institute.

Input and Output Formats

Lastz accepts nucleotide FASTA files as input, compatible with sequence data produced by platforms and consortia like Illumina, PacBio, and Oxford Nanopore Technologies. Command-line options enable specification of scoring matrices and seed patterns similar to parameters used in tools associated with GATK pipelines or sequence alignment utilities from EMBOSS. Output formats include gapped alignment formats and tabular summaries; Lastz can produce alignment output suitable for conversion into MAF blocks used by multiple-alignment pipelines and for visualization in genome browsers such as IGV and UCSC Genome Browser. Integration scripts commonly convert Lastz output into formats consumed by downstream tools like BEDTools, SAMtools, and alignment viewers maintained by groups at Broad Institute and EBI.

Performance and Accuracy

Benchmarks comparing Lastz to tools such as BLASTZ, MUMmer, and Minimap2 show trade-offs between sensitivity, specificity, and runtime depending on sequence divergence and repeat content. On well-assembled vertebrate chromosomes from projects like 1000 Genomes Project or Genome Reference Consortium, Lastz demonstrates high sensitivity for detecting orthologous blocks while controlling false positives by masking repeats through RepeatMasker or filtering against databases at NCBI. Performance tuning of seed patterns and scoring parameters has been described in method papers and technical notes from groups at UC Santa Cruz and Broad Institute, and practitioners often compare Lastz results with alignments from LAST and BWA-MEM when evaluating structural variation or synteny. Scalability to whole-genome alignments has been demonstrated in comparative studies involving species housed in repositories such as Ensembl and NCBI RefSeq.

Applications

Lastz is used in whole-genome alignment projects for comparative genomics between species including Homo sapiens, Pan troglodytes, Mus musculus, Rattus norvegicus, and diverse nonmodel organisms. It is employed in workflows for detecting conserved noncoding elements in studies by consortia like ENCODE Project and in synteny mapping efforts by teams at Ensembl and the UCSC Genome Browser. Other applications include aligning assembled contigs from sequencing centers such as Wellcome Sanger Institute and Broad Institute for scaffolding, comparative annotation with resources like RefSeq, and preprocessing steps in variant discovery pipelines used in clinical and population studies associated with 1000 Genomes Project and Human Microbiome Project.

Development and Availability

Lastz was developed by James Kent and collaborators in the early 2000s at institutions linked to the UCSC Genome Browser project and remains available as open-source C code distributed for Unix-like systems. Source code and binaries have historically been hosted through institutional repositories and mirrors used by groups such as UC Santa Cruz and community package managers favored by bioinformatics groups at Broad Institute and EMBL-EBI. Users typically obtain Lastz via code archives or through integration in pipelines managed by workflow systems like Snakemake and Cromwell. Community support and discussion occur on mailing lists and in forums frequented by developers and bioinformaticians from organizations including Wellcome Sanger Institute, Broad Institute, Ensembl, and university research groups.

Category:Bioinformatics software