PhyML — LLMpedia

PhyML
Name	PhyML
Developer	Louis-Philippe Guindon, Olivier Gascuel, et al.
Released	2003
Latest release	2010s
Operating system	Linux, macOS, Windows
License	GPL

Contents

Overview
Methods and Algorithms
Implementation and Features
Performance and Benchmarking
Applications and Use Cases
Development and History

PhyML

PhyML is a software tool for estimating maximum likelihood phylogenies from nucleotide and amino-acid sequence alignments. It is widely used in molecular systematics, comparative genomics, and evolutionary biology, with methods grounded in statistical inference, model selection, and numerical optimization. The program has influenced and interacted with several prominent projects in computational biology and bioinformatics.

Overview

PhyML provides maximum likelihood estimation of phylogenetic trees from aligned sequence alignment data sets and supports a range of substitution models for DNA and protein evolution. It complements other phylogenetic packages such as RAxML, MrBayes, BEAST, MEGA and integrates concepts familiar to users of PHYLIP and PAUP*. PhyML's focus on speed and model-flexibility positions it among tools used in pipelines with BLAST, MAFFT, Clustal, MUSCLE, IQ-TREE and FastTree.

Methods and Algorithms

PhyML implements maximum likelihood tree search using heuristic branch-swapping strategies inspired by optimizations found in likelihood-based frameworks such as Felsenstein's pruning algorithm, and heuristic moves akin to those in Neighbor-Joining and subtree-prune-regraft approaches used in RAxML and GARLI. It supports common substitution models including Jukes–Cantor, Kimura 2-parameter, HKY85, GTR for nucleotides and empirical matrices like JTT, WAG, LG for proteins. Rate heterogeneity among sites is modeled with discrete gamma distribution categories and a proportion of invariant sites as in methods used by other ML packages and techniques adopted in MrBayes and BEAST. PhyML uses numerical optimization of branch lengths and model parameters via algorithms related to Newton-Raphson and expectation-maximization routines similar to strategies in PAML and codeml workflows. Bootstrap and approximate likelihood-ratio test approaches for node support relate to methods developed in Joe Felsenstein's work and the aLRT statistic often compared with results from Bayes factor evaluations in Bayesian toolchains.

Implementation and Features

Implemented in the C programming language, PhyML is distributed under the GNU General Public License and runs on Linux, macOS, and Microsoft Windows. It accepts standard alignment formats used across GenBank, EMBL, and UniProt derived data sets and interoperates with alignment tools such as MAFFT, Clustal Omega, and MUSCLE. Output tree formats are compatible with viewers and editors including FigTree, Dendroscope, TreeView and integration into workflow managers like Galaxy and pipeline systems used in projects at institutions such as EMBL-EBI, NCBI, and Wellcome Sanger Institute. Features include model selection options comparable to ModelTest and ProtTest, branch support metrics used in RAxML and IQ-TREE, and options for dealing with partitioned data sets analogous to those in PartitionFinder.

Performance and Benchmarking

PhyML has been benchmarked against contemporaries including RAxML, FastTree, IQ-TREE, MrBayes, and BEAST using empirical data sets from resources like TreeBASE and simulated alignments produced using engines comparable to tools from Seq-Gen and INDELible. Reports from research groups at Université de Montréal, CNRS, Université Paris-Sud, and consortiums working with Genome Canada indicate that PhyML offers a trade-off between speed and exhaustive search accuracy, often faster than exhaustive Bayesian heuristics such as MrBayes while less aggressive in search heuristics than FastTree for very large alignments. Comparative studies published in outlets associated with Nature and PLOS venues have used PhyML as a baseline for model-based phylogenetic inference alongside pipelines developed in labs at Harvard University, University of California, Berkeley, and Max Planck Society.

Applications and Use Cases

PhyML has been applied in studies ranging from viral phylogenetics involving datasets from GISAID and GenBank to bacterial comparative genomics in projects coordinated by CDC and WHO. It is used in phylogeography studies related to HIV/AIDS research, conservation genetics projects at institutions like Smithsonian Institution and Kew Gardens, and evolutionary analyses in model organisms studied at Broad Institute, Sanger Centre, and universities such as Oxford and Cambridge. PhyML integrates into workflows for detecting positive selection in conjunction with tools from PAML and HyPhy, and into macroevolutionary analyses using software linked to Mesquite and BEAST. It is frequently cited in phylogenomic studies from consortia like the Earth BioGenome Project and biodiversity initiatives supported by NSF and European Research Council grants.

Development and History

PhyML was originally developed by Louis-Philippe Guindon and Olivier Gascuel and released in the early 2000s, building on methodological foundations laid by researchers such as Ziheng Yang, Joe Felsenstein, and Masatoshi Nei. Subsequent versions incorporated contributions from computational groups at Université de Montpellier, Université de Lyon, and collaborative teams linked to CNRS and INRIA. Its development timeline intersects with major events in bioinformatics such as the rise of high-throughput sequencing at the Broad Institute and the formation of data repositories like EMBL-EBI and GenBank. The software has been described in peer-reviewed publications coauthored by its developers and cited across literature from journals associated with Oxford University Press, Wiley-Blackwell, and PLOS. Ongoing maintenance and community adoption have been supported by academic labs and bioinformatics core facilities at institutions including Institut Pasteur, University of Toronto, and McGill University.

Category:Phylogenetics