Generated by Llama 3.3-70B| Burrows-Wheeler transform | |
|---|---|
| Name | Burrows-Wheeler transform |
| Data structure | String |
Burrows-Wheeler transform is a reversible transformation of a string that rearranges the symbols of the string to create runs of identical symbols, making it more compressible by algorithms like Huffman coding and Lempel-Ziv-Welch algorithm. This transformation is closely related to the work of Michael Burrows and David Wheeler, who first introduced it in their 1994 paper, which was later influenced by the research of Eugene Myers and Webb Miller. The Burrows-Wheeler transform has been widely used in various fields, including bioinformatics, data compression, and text indexing, with notable applications in the Human Genome Project and the development of the BLAST algorithm by Stephen Altschul and David Lipman.
The Burrows-Wheeler transform is a powerful tool for text compression and indexing, which has been extensively used in various applications, including genomics, proteomics, and natural language processing. It was first introduced by Michael Burrows and David Wheeler in their 1994 paper, which was later improved by the work of Richard Durbin and James R. Knight. The transform is closely related to the suffix array data structure, which was first introduced by Udi Manber and Gene Myers. The Burrows-Wheeler transform has been used in various software packages, including SAMtools developed by Heng Li and Richard Durbin, and Bowtie developed by Ben Langmead and Steven Salzberg.
The Burrows-Wheeler transform algorithm works by first creating a suffix array of the input string, which is then used to create a matrix of all rotations of the string. The matrix is then sorted lexicographically, and the last column of the sorted matrix is extracted to produce the Burrows-Wheeler transform of the string. This process is closely related to the work of Donald Knuth and James H. Morris, who first introduced the concept of lexicographic sorting. The algorithm has been optimized by various researchers, including Giovanni Manzini and Paolo Ferragina, who developed a more efficient algorithm for computing the Burrows-Wheeler transform.
The Burrows-Wheeler transform has a wide range of applications in various fields, including bioinformatics, data compression, and text indexing. It has been used in various software packages, including BLAST developed by Stephen Altschul and David Lipman, and GenBank developed by the National Center for Biotechnology Information. The transform has also been used in the development of various algorithms, including the Smith-Waterman algorithm developed by Temple Smith and Michael Waterman, and the Needleman-Wunsch algorithm developed by Saul Needleman and Christian Wunsch. Additionally, the Burrows-Wheeler transform has been used in the analysis of large-scale genomic data, including the 1000 Genomes Project and the Human Genome Project.
For example, the Burrows-Wheeler transform of the string "banana" can be computed by first creating a suffix array of the string, which is then used to create a matrix of all rotations of the string. The matrix is then sorted lexicographically, and the last column of the sorted matrix is extracted to produce the Burrows-Wheeler transform of the string, which is "annbba". This process is closely related to the work of Robert Sedgewick and Kevin Wayne, who first introduced the concept of string sorting. The Burrows-Wheeler transform has been used in various applications, including text compression and data encryption, with notable contributions from researchers like Martin Hellman and Whitfield Diffie.
The time complexity of the Burrows-Wheeler transform algorithm is O(n log n), where n is the length of the input string. This is because the algorithm involves sorting the matrix of all rotations of the string, which has a time complexity of O(n log n). The algorithm has been optimized by various researchers, including Giovanni Manzini and Paolo Ferragina, who developed a more efficient algorithm for computing the Burrows-Wheeler transform with a time complexity of O(n). The Burrows-Wheeler transform has been used in various applications, including genomics and proteomics, with notable contributions from researchers like Eric Lander and David Haussler.
There are several variants of the Burrows-Wheeler transform, including the suffix array and the LCP array. The suffix array is a data structure that contains the starting positions of all suffixes of the input string, and is closely related to the work of Udi Manber and Gene Myers. The LCP array is a data structure that contains the lengths of the longest common prefixes of all suffixes of the input string, and is closely related to the work of Richard Durbin and James R. Knight. The Burrows-Wheeler transform has been used in various software packages, including SAMtools developed by Heng Li and Richard Durbin, and Bowtie developed by Ben Langmead and Steven Salzberg. Additionally, the Burrows-Wheeler transform has been used in the development of various algorithms, including the BWT-SW algorithm developed by Heng Li and Richard Durbin, and the BWA algorithm developed by Heng Li and Bob Handsaker.