TreeBASE — LLMpedia

TreeBASE
Name	TreeBASE
Type	Biological database
Scope	Phylogenetic data repository
Discipline	Systematics
Country	United States
Established	1994
Maintained by	University of California, Riverside; originally by the University of Kansas and State University of New York

Contents

Overview
History and Development
Scope and Content
Data Submission and Curation
Access and Tools
Impact and Usage in Research

TreeBASE TreeBASE is a public repository for phylogenetic data, primarily nucleotide matrices, morphological matrices, and published phylogenetic trees. It serves researchers in systematics and phylogenetics by preserving data associated with publications and supporting reproducible analyses. TreeBASE interoperates with journals, funding agencies, and informatics platforms to link empirical datasets to manuscripts and downstream tools.

Overview

TreeBASE aggregates phylogenetic matrices, character codings, and inferred trees submitted by authors associated with institutions such as the Smithsonian Institution, University of California, Riverside, University of Kansas, State University of New York, and research centers like the Natural History Museum, London. The repository complements resources such as GenBank, Dryad Digital Repository, MorphoBank, PANGAEA, BOLD Systems, and GBIF by focusing on inferred phylogenetic hypotheses and their input matrices. TreeBASE supports workflows used by researchers at organizations including the Royal Botanic Gardens, Kew, Max Planck Society, Australian National University, and museums like the American Museum of Natural History.

History and Development

Development began in the 1990s amid initiatives at institutions including the University of Kansas and the State University of New York at Stony Brook and was shaped by community discussions involving stakeholders such as the Society of Systematic Biologists, the International Barcode of Life Consortium, and journal editors from publications like Systematic Biology, Molecular Phylogenetics and Evolution, and Cladistics. Funding and technical support have involved agencies and programs such as the National Science Foundation, the National Institutes of Health, and collaborations with projects at the Tree of Life Web Project and the Encyclopedia of Life. Over time TreeBASE integrated standards promoted by groups like the Darwin Core community and the Consortium for the Barcode of Life while adapting to new software ecosystems exemplified by RAxML, MrBayes, BEAST, and PAUP*.

Scope and Content

TreeBASE stores toy to large-scale phylogenetic matrices produced by research teams from institutions like Harvard University, Yale University, Stanford University, University of Chicago, and research networks such as the Global Biodiversity Information Facility participants. Content covers organismal groups studied at centers like the Smithsonian Tropical Research Institute, including clades treated in monographs from publishers like Oxford University Press and Cambridge University Press. The repository includes submissions tied to high-profile studies published in venues including Nature, Science, Proceedings of the National Academy of Sciences, and specialist journals such as Journal of Biogeography and New Phytologist. TreeBASE entries often cite taxonomic authorities such as the International Code of Nomenclature for algae, fungi, and plants and standards from the International Commission on Zoological Nomenclature.

Data Submission and Curation

Submitting authors from universities such as the University of Michigan, University of California, Berkeley, University of Texas at Austin, and research institutes like the Max Planck Institute for Evolutionary Anthropology upload matrices, metadata, and tree files following guidelines informed by organizations including the Alliance for Taxonomic Transparency and policies adopted by publishers such as PLOS, Wiley-Blackwell, and Springer Nature. Curation workflows reference formats used by tools like Nexus (file format), NeXML, and PhyloXML, and metadata schemas influenced by DataCite and the Open Archives Initiative. Curators and administrators coordinate with professional societies such as the Society for the Preservation of Natural History Collections and institutional repositories at places like the University of Florida and Cornell University.

Access and Tools

TreeBASE provides programmatic interfaces and web services that integrate with analysis software developed by teams at labs like the University of Washington and companies such as Elsevier-hosted platforms; it supports workflows employed with tools like Mesquite (software), IQ-TREE, FastTree, and visualization packages such as Dendroscope and FigTree. Users from consortia like the Genomic Standards Consortium and projects such as Open Tree of Life use TreeBASE content via APIs and data pipelines that interoperate with infrastructures including HPC centers at institutions like Lawrence Berkeley National Laboratory and cloud services used by the European Molecular Biology Laboratory. Data discovery leverages indexing strategies similar to those used by PubMed, Google Scholar, and library systems at institutions like the Library of Congress.

Impact and Usage in Research

Researchers at universities such as Princeton University, Columbia University, University of Oxford, University of Cambridge, and institutions like the Royal Society use TreeBASE-hosted matrices and trees to reproduce analyses, conduct meta-analyses, and synthesize large-scale phylogenies for projects like the Open Tree of Life and systematic reviews published in journals such as Annual Review of Ecology, Evolution, and Systematics. TreeBASE-enabled reproducibility has influenced data policies at publishers including BioMed Central and funders such as the Wellcome Trust. Studies at museums like the Natural History Museum, Los Angeles County and botanical gardens such as Missouri Botanical Garden have used TreeBASE records to reassess taxonomic hypotheses, calibrate molecular clocks with methods developed in BEAST2, and integrate morphological matrices in comparative analyses using GEIGER (R package) and APE (R package).

Category:Biological databases