Generated by GPT-5-mini| BWA (software) | |
|---|---|
| Name | BWA |
| Operating system | Unix-like, Linux, macOS |
| Genre | Bioinformatics software, Sequence alignment |
| License | BSD-like |
BWA (software) is a software package for fast alignment of sequencing reads against large reference genomes. It was developed to support high-throughput sequencing platforms and is widely used in genomics pipelines for variant discovery, population genetics, and clinical sequencing. BWA implements efficient indexing and alignment algorithms optimized for short and long reads and integrates with many bioinformatics tools and standards.
BWA was authored by a computational biologist associated with institutions involved in human genomics and next-generation sequencing initiatives, and it gained adoption alongside platforms from Illumina, Roche Diagnostics, Life Technologies, Oxford Nanopore Technologies, Pacific Biosciences and community projects such as 1000 Genomes Project and The Cancer Genome Atlas. The software leverages concepts originating from the Burrows–Wheeler transform and the FM-index, building on prior algorithmic work exemplified by tools like Bowtie (bioinformatics), SOAP3, MAQ (software), Novoalign and LAST (bioinformatics). It outputs alignments in formats compatible with SAMtools, Picard (software), GATK, BEDTools, VCF (file format) and other ecosystem components used in pipelines at institutions including the Broad Institute, European Bioinformatics Institute, Wellcome Sanger Institute, National Institutes of Health and clinical sequencing centers.
BWA implements multiple alignment modes tailored to sequencing data characteristics: a short-read mode based on the Burrows–Wheeler transform and FM-index for high-throughput short reads, and additional modes to handle longer or error-prone reads inspired by algorithms appearing in publications associated with groups at University of California, Berkeley, Harvard University, University of Oxford and University of Cambridge. Specific modes include an exact-match seeding strategy with backtracking similar to methods used in Bowtie (bioinformatics), a gapped alignment mode akin to dynamic programming approaches from Needleman–Wunsch algorithm and Smith–Waterman algorithm, and split-read/long-read strategies that echo approaches in BLAST and LAST (bioinformatics). Parameters such as mismatch penalties, gap open/extension costs, and seed length reflect algorithmic choices influenced by research from Stanford University, Massachusetts Institute of Technology and sequencing consortia like ENCODE.
Performance benchmarks typically compare BWA with contemporaneous aligners such as Bowtie 2, STAR (software), HISAT2, Minimap2, Novoalign, SOAP2 and MAQ (software), often in studies conducted by groups at Wellcome Sanger Institute, Broad Institute, European Bioinformatics Institute and national genome centers like Wellcome Trust Sanger Institute. Results reported in peer-reviewed comparisons appearing in journals affiliated with Nature Publishing Group, PLoS, Bioinformatics (journal), Genome Research and Nature Methods show trade-offs between speed, memory usage, and alignment accuracy for SNP and indel calling. Accuracy for short reads versus long reads is discussed in analyses by researchers at European Molecular Biology Laboratory, Max Planck Society and clinical laboratories at Mayo Clinic and Johns Hopkins Hospital. Real-world performance depends on hardware such as servers from Dell Technologies, HPE, and cloud instances offered by Amazon Web Services, Google Cloud Platform and Microsoft Azure.
BWA is implemented in the C programming language and distributed as a command-line tool for Unix, Linux, and macOS environments, with package availability in repositories maintained by organizations like Bioconda, Debian, Homebrew and Conda (package manager). Typical workflows integrate BWA-produced SAM or BAM files into pipelines utilizing SAMtools, Picard (software), GATK, Snakemake, Nextflow and workflow platforms developed at centers including Broad Institute and European Bioinformatics Institute. Users invoke indexing, alignment, and post-processing steps on compute clusters managed with resource managers such as SLURM, SGE (Sun Grid Engine), and cloud orchestration from Kubernetes. The tool supports paired-end and single-end data sets generated by platforms including Illumina NovaSeq, Illumina HiSeq, MiSeq and third-generation sequencers from Pacific Biosciences and Oxford Nanopore Technologies.
BWA is widely used in pipelines for variant discovery in projects like 1000 Genomes Project, somatic mutation detection in cancer studies from The Cancer Genome Atlas, structural variant analysis in population genomics from Genome Aggregation Database, metagenomics in initiatives led by Human Microbiome Project, and clinical diagnostics in programs run by Genomics England and regional health systems affiliated with NHS England. Integration points include preprocessing with FastQC, trimming with Trimmomatic or Cutadapt, alignment post-processing with SAMtools and Picard (software), variant calling with GATK, FreeBayes, Strelka, annotation via ANNOVAR and VEP (Variant Effect Predictor), and visualization with IGV (software), UCSC Genome Browser and Ensembl genome browser.
The project has evolved through releases guided by contributors associated with academic laboratories and sequencing centers, with code hosted historically on repositories and collaborative platforms used by groups such as GitHub and mirrored by institutional mirrors at European Bioinformatics Institute and Wellcome Sanger Institute. Licensing follows a permissive BSD-like model enabling academic, commercial and clinical use, consistent with licensing practices at institutions like Broad Institute and open-source communities including Open Bioinformatics Foundation. Ongoing maintenance and community contributions draw on developers and bioinformaticians from universities including University of California, San Diego, University of Washington, Fred Hutchinson Cancer Research Center and consortia such as GA4GH.
Category:Bioinformatics software