LLMpediaThe first transparent, open encyclopedia generated by LLMs

USEARCH

Generated by GPT-5-mini
Note: This article was automatically generated by a large language model (LLM) from purely parametric knowledge (no retrieval). It may contain inaccuracies or hallucinations. This encyclopedia is part of a research project currently under review.
Article Genealogy
Parent: PyMC Hop 5
Expansion Funnel Raw 90 → Dedup 0 → NER 0 → Enqueued 0
1. Extracted90
2. After dedup0 (None)
3. After NER0 ()
4. Enqueued0 ()
USEARCH
NameUSEARCH
DeveloperEdgar RC
Initial release2010
Latest release2016
Operating systemUnix-like; Windows (via compatibility)
LicenseProprietary (freemium)

USEARCH USEARCH is a closed-source bioinformatics software tool for high-throughput sequence analysis widely cited in microbial ecology, metagenomics, phylogenetics, and environmental genomics. It provides fast sequence clustering, chimera detection, dereplication, and search functionality used alongside tools from major projects and institutions such as the Human Microbiome Project, Earth Microbiome Project, Department of Energy, and research groups at Harvard University, Stanford University, and the Max Planck Society. Its performance claims are often compared to software from groups including European Bioinformatics Institute, National Center for Biotechnology Information, Wellcome Sanger Institute, and developers associated with Broad Institute and EMBL.

Overview

USEARCH implements sequence analysis routines for amplicon and shotgun datasets generated on platforms developed by Illumina, Roche 454, Pacific Biosciences, and Oxford Nanopore Technologies. It integrates methods for operational taxonomic unit (OTU) clustering, exact sequence variant handling, and chimera filtering used by pipelines like those from QIIME, mothur, and workflows run in environments such as Galaxy (platform), XSEDE, and cloud providers like Amazon Web Services and Google Cloud Platform. The tool is often cited in studies involving taxa from databases maintained by SILVA, Greengenes, and RDP (Ribosomal Database Project), and used in analyses published in journals such as Nature, Science, Nature Methods, PLoS ONE, and ISME Journal.

History and Development

Development began in the late 2000s amid rising demand from projects like the Human Microbiome Project and initiatives at the Broad Institute and DOE Joint Genome Institute. Its author released incremental versions through the early 2010s while academic groups at University of Colorado, University of California, San Diego, University of Illinois Urbana-Champaign, and University of Michigan adopted it for large-scale 16S rRNA surveys. Comparative evaluations featured collaborations and comparisons with algorithms from teams at EMBL-EBI, NCBI, and academic labs at Yale University, Princeton University, Massachusetts Institute of Technology, and Johns Hopkins University.

Algorithms and Features

USEARCH implements greedy clustering, heuristic k-mer based indexing, and pairwise alignment strategies with accelerations inspired by methods from authors associated with BLAST development at NCBI and hashing strategies used by groups at University of California, Santa Cruz. Core features include de-replication, global and local alignment, chimera detection analogous to approaches used by UCHIME authors, and centroid-based OTU picking widely used in studies at University of Wisconsin–Madison, Cornell University, and University of Texas at Austin. The software supports outputs compatible with taxonomic assignment frameworks provided by SILVA, Greengenes, and classification approaches employed by teams at Ribosomal Database Project, Bayesian classifier implementations utilized in projects at University of Oslo and Karolinska Institutet.

Licensing and Availability

USEARCH is distributed under a proprietary license with freemium terms and commercial options, contrasting with fully open-source projects released by groups at EMBL, Cold Spring Harbor Laboratory, and Broad Institute. Academic users at institutions like University of Oxford, Cambridge University, Imperial College London, and ETH Zurich often navigated site licenses or used legacy binaries in university clusters provided by National Supercomputing Center facilities. Discussions about licensing have been raised in forums frequented by contributors from GitHub, Bioconductor, and community platforms such as SEQanswers and Stack Overflow.

Performance and Benchmarks

Benchmarks reported by independent groups at Lawrence Berkeley National Laboratory, Joint Genome Institute, and university labs compared USEARCH to tools such as VSEARCH, CD-HIT, BLAST+, and newer exact-variant pipelines from teams at DADA2 (associated with University of California, Santa Cruz) and Deblur (linked to UC San Diego). Studies published in venues including Bioinformatics, BMC Bioinformatics, and Frontiers in Microbiology evaluated speed, memory consumption, clustering accuracy, and chimera detection rates across datasets from Human Microbiome Project, Earth Microbiome Project, and environmental surveys run by researchers at Woods Hole Oceanographic Institution and Scripps Institution of Oceanography.

Applications and Use Cases

USEARCH has been applied in microbial community profiling in clinical studies at Mayo Clinic and Cleveland Clinic, environmental monitoring projects linked to United States Geological Survey, agricultural microbiome research at United States Department of Agriculture laboratories, and conservation genomics efforts with partners such as Smithsonian Institution. It appears in workflows combining assembly tools from groups at SPAdes developers, read mappers associated with Bowtie and BWA projects, and downstream analysis in statistical environments like R (programming language) and visualization platforms such as Cytoscape and QGIS. Researchers at institutions including University of Minnesota, Pennsylvania State University, Duke University, University of Washington, and University of British Columbia have used it for projects spanning pathogen surveillance, biogeography, and microbiome-host interaction studies.

Category:Bioinformatics software