Clustal — LLMpedia

Clustal
Name	Clustal
Developer	European Molecular Biology Laboratory; University of Oxford; EMBL-EBI
Released	1988
Latest release version	varies
Programming language	C (programming language); Perl (programming language); Java (programming language)
Operating system	Unix; Linux; Microsoft Windows; macOS
Genre	Multiple sequence alignment
License	varies

Contents

History
Algorithm and methods
Software versions and implementations
Applications and usage
Performance and benchmarks
Limitations and alternatives

Clustal is a family of broadly used multiple sequence alignment tools originally developed for biological sequence analysis. The software lineage arose in the late 1980s and evolved into widely cited packages employed across molecular biology, genomics, structural biology, and evolutionary studies. Clustal implementations have been integrated into bioinformatics infrastructures and referenced in workflows alongside resources such as GenBank, UniProt, Protein Data Bank, and services hosted at European Bioinformatics Institute.

History

Clustal's development traces to collaborations among researchers at institutions like European Molecular Biology Laboratory, University of Oxford, EMBL-EBI, and laboratories influenced by the computational approaches from groups tied to Cold Spring Harbor Laboratory and Harvard University. Early versions built on progressive refinement of pairwise alignment concepts stemming from algorithms associated with Needleman–Wunsch algorithm and Smith–Waterman algorithm. The project was contemporaneous with database expansions at GenBank and annotation initiatives at Swiss-Prot, prompting integration with community resources such as PubMed citation indexing. Over successive releases, contributions from scientists affiliated with Wellcome Trust-funded groups and collaborative projects linked to National Institutes of Health influenced feature additions, web services, and distribution through repositories used by European Molecular Biology Laboratory and bioinformatics centers like EMBL-EBI.

Algorithm and methods

Clustal implements a progressive alignment strategy that begins with pairwise alignments using heuristics inspired by methods from Needleman–Wunsch algorithm derivatives and distance estimation akin to approaches leveraged in phylogenetics at institutions like Max Planck Society and Smithsonian Institution. The algorithm constructs a guide tree with clustering techniques related to algorithms popularized in research at University of Cambridge and University of California, Berkeley, then progressively aligns sequences following the topology similar to approaches used in studies at Sanger Institute and Cold Spring Harbor Laboratory. Profile alignment and sequence weighting schemes echo methodologies developed in laboratories affiliated with Massachusetts Institute of Technology and Stanford University. Scoring matrices commonly employed include substitutions from the PAM matrix lineage and the BLOSUM family, both referenced frequently in work published via Nature and Science.

Software versions and implementations

Multiple iterations and ports of Clustal have been released by teams associated with EMBL-EBI, University of Oxford, and contributors linked to community projects like Bioconductor and package ecosystems such as those maintained at GitHub. Notable branches have been implemented in languages including C (programming language) and Perl (programming language), with graphical front ends inspired by interfaces used in software from European Molecular Biology Laboratory and integrations into platforms like Galaxy (platform) and UCSC Genome Browser. The Clustal family coexists with alignment utilities distributed via repositories curated by institutions such as European Bioinformatics Institute and mirrors hosted by National Center for Biotechnology Information and computational infrastructures at CERN for high-throughput deployments.

Applications and usage

Researchers apply Clustal in contexts spanning comparative analyses of sequences deposited in GenBank and annotations linked to UniProt, structure inference referencing Protein Data Bank entries, and evolutionary reconstructions paralleling studies published in Systematic Biology and Molecular Biology and Evolution. Workflows that combine Clustal outputs with tree estimation tools developed at University of California, Los Angeles or visualization tools from groups like Broad Institute are common. Clustal alignments serve as inputs for motif discovery methods used by teams at EMBL-EBI and for downstream analyses in pipelines hosted on platforms such as Galaxy (platform) and computational notebooks employed at Stanford University.

Performance and benchmarks

Benchmarking studies comparing Clustal to contemporaneous tools have been conducted in laboratories at European Bioinformatics Institute, National Center for Biotechnology Information, and universities such as University of Washington and University of California, San Diego. These evaluations often use reference sets curated by consortia linked to Critical Assessment of protein Structure Prediction and datasets from Protein Data Bank and Swiss-Prot. Results typically report trade-offs between alignment accuracy and computational cost, paralleling performance discussions published in journals like Bioinformatics and Nucleic Acids Research. High-throughput comparisons include measures of speed on compute clusters provided by institutions like Lawrence Berkeley National Laboratory and Argonne National Laboratory.

Limitations and alternatives

Clustal's progressive heuristic can suffer from error propagation and sensitivity to divergent sequence sets, a limitation noted alongside critiques in reviews from groups affiliated with Max Planck Society, Wellcome Trust, and research published in Genome Research. Alternatives and complementary tools include alignment programs developed at European Molecular Biology Laboratory and University of California, Santa Cruz such as MAFFT, MUSCLE, T-Coffee, and PRANK, with implementations and benchmarking by teams at EMBL-EBI and National Center for Biotechnology Information. For large-scale and structural alignments, methods emerging from research at Broad Institute, Sanger Institute, and cloud platforms used by Amazon Web Services or Google Cloud Platform offer scalable pipelines that address some Clustal constraints.

Category:Bioinformatics