MAKER — LLMpedia

MAKER
Name	MAKER
Released	2003
Latest release	2.31.10
Programming language	Perl, C, C++
Operating system	Unix, Linux, macOS
License	BSD-like

Contents

Overview
History
Design and Architecture
Applications and Use Cases
Performance and Evaluation
Community and Development
Criticisms and Limitations

MAKER

MAKER is an open-source genome annotation pipeline that integrates evidence from ab initio gene predictors, transcript alignments, and protein homology to produce structural and functional annotations for eukaryotic genomes. Originally developed to support annotation projects for model organisms and emerging sequenced taxa, MAKER orchestrates tools such as BLAST, Exonerate, Augustus, and SNAP to reconcile diverse evidence into gene models and GFF3 outputs. It is widely used by consortia and research groups working on genomes ranging from microbes to vertebrates, and is often incorporated into workflows alongside resources like Ensembl and NCBI.

Overview

MAKER is a modular pipeline that automates the integration of external software and databases to annotate genome assemblies. It accepts inputs including assembled contigs or scaffolds, expressed sequence tag datasets such as from GenBank or RefSeq, transcriptome assemblies like those produced by Trinity or StringTie, and protein databases such as UniProt or organism-specific proteomes. MAKER produces standardized annotation outputs compatible with tools like Apollo and JBrowse and can annotate repeats via engines such as RepeatMasker and RepeatModeler. Its configuration-driven approach enables use in projects from single-genome studies to large-scale initiatives like the i5k Initiative and various national reference genome programs.

History

Development of MAKER began in the early 2000s to address the growing need for automated annotation as sequencing throughput increased with platforms such as Illumina and 454 pyrosequencing. Early versions emphasized evidence alignment using BLAST and Exonerate and incorporated ab initio predictors like GlimmerHMM and SNAP. Subsequent releases added support for training predictors with RNA-seq evidence, integration with annotation edit tools, and improved handling of fragmented assemblies generated by assemblers including SPAdes and SOAPdenovo. MAKER has been adopted by projects annotating genomes of taxa represented by resources such as FlyBase, WormBase, and plant databases tied to TAIR-related research, reflecting its utility across diverse organismal communities.

Design and Architecture

MAKER's architecture centers on a master controller that manages a set of evidence-processing modules. Each module invokes external programs—protein aligners like BLAST+ or DIAMOND, spliced aligners such as GMAP or BLAT, and gene predictors including Augustus and SNAP—and parses their outputs into a unified internal representation. The pipeline supports repeat annotation via libraries from Repbase and de novo repeat identification with RepeatModeler. MAKER uses a configuration file scheme to specify inputs, parameters, and resource paths, and produces outputs in formats like GFF3, FASTA, and AED (Annotation Edit Distance) scores used by curators working with platforms like GenBank submissions and EnsemblGenomes import workflows.

Applications and Use Cases

MAKER has been applied to generate annotations for diverse organisms: insect genomes submitted to the i5k Initiative, plant genomes associated with Phytozome queries, vertebrate assemblies tied to projects like the Vertebrate Genomes Project, and fungal genomes within FungiDB-linked studies. Researchers use MAKER to annotate reference-quality assemblies and draft genomes, to train species-specific gene predictors for use with Augustus or SNAP, and to produce annotations that feed into comparative genomics pipelines involving OrthoDB, OMA, or Ensembl Compara. MAKER also supports functional annotation by transferring descriptions from UniProtKB and assigning ontology terms leveraged by Gene Ontology curation.

Performance and Evaluation

Annotation quality from MAKER is typically evaluated using metrics such as AED, BUSCO completeness assessments against databases like OrthoDB or BUSCO lineage datasets, and comparisons to community-curated gene sets from resources including RefSeq and Ensembl. Performance scales with compute resources: parallelization strategies often employ cluster schedulers like SLURM or SGE to run evidence searches and predictor jobs concurrently. Benchmarking studies compare MAKER-derived annotations to those from pipelines such as BRAKER and PASA-centered workflows, noting trade-offs in sensitivity, specificity, and runtime depending on input evidence quality and assembly contiguity.

Community and Development

MAKER development has been driven by academic groups and bioinformatics communities contributing code, documentation, and training materials. Users coordinate via mailing lists, issue trackers on hosting platforms, and workshops organized at conferences such as ISMB and society meetings for Genetics Society-affiliated communities. Integrations and wrappers have been produced to connect MAKER with workflow systems like Snakemake, Nextflow, and Cromwell, facilitating reproducible pipelines in cloud environments hosted by providers used by consortia such as ELIXIR members and national genomics centers.

Criticisms and Limitations

Critiques of MAKER include dependency on external software versions which can complicate reproducibility without containerization tools like Docker or Singularity, and reduced performance on highly fragmented assemblies produced by short-read assemblers such as Velvet. Some users report that tuning ab initio predictor training requires substantial manual intervention and expertise compared to fully automated approaches like BRAKER2. Additionally, generating high-quality functional annotations depends on available homology evidence from databases such as UniProt and curated gene sets, limiting effectiveness for deeply divergent or poorly sampled clades.

Category:Genome annotation software