Taverna (software)

Taverna (software)
Name	Taverna
Developer	EMBL-EBI, University of Manchester, CRG and community contributors
Released	2004
Programming language	Java
Operating system	Cross-platform
Genre	Scientific workflow
License	GPL

Contents

Overview
Features and Architecture
Workflow Authoring and Execution
Integrations and Supported Services
Development History and Versions
Adoption and Use Cases
Licensing and Community Ecosystem

Taverna (software) is an open-source scientific workflow management system developed for composition, orchestration and execution of distributed computational and data services. It enables researchers from fields such as bioinformatics, chemistry, ecology, astronomy and systems biology to integrate heterogeneous resources including web services, command-line tools and databases into reproducible analysis pipelines. Originally produced by a consortium of academic institutions, it has been used alongside other platforms in collaborative projects and infrastructure initiatives.

Overview

Taverna provides a graphical workbench and a server for designing and running workflows that coordinate services such as SOAP, REST endpoints, BioMart, UniProt, and local tools. It was conceived to address interoperability challenges encountered by researchers using distributed resources hosted by organizations like the EMBL-EBI, NCBI, and national e‑infrastructures. The project emphasizes provenance capture, reproducibility and sharing of computational experiments within communities associated with projects such as myGrid, GEN2PHEN, and ELIXIR.

Features and Architecture

The architecture separates a desktop workbench for authoring from a server component for remote execution. The workbench offers a canvas for graphically composing workflows, invoking service discovery and inspecting intermediate results; the server provides queuing, throttling and remote invocation for workflows requested via SOAP or REST APIs. Core features include support for iterative and nested dataflows, service orchestration, provenance recording compatible with standards like PROV, error handling, and data streaming. The system is implemented in Java and relies on libraries and standards used in projects from institutions such as University of Manchester, Manchester Informatics groups, and collaborators at the CRG.

Workflow Authoring and Execution

Users author workflows in the Taverna workbench by dragging and connecting processors that encapsulate operations such as web service invocation, script execution or data transformation; processors can wrap SOAP services, REST endpoints, local scripts and command-line interface tools. The engine supports data-parallel fan‑out/fan‑in, nested loops and conditional branching, enabling complex pipelines used in projects like Phylogenetics and Metagenomics. Execution can be performed locally in the workbench or remotely on the Taverna Server, which exposes endpoints compatible with workflow management systems and integration with portal frameworks such as Galaxy, KNIME, and science gateways developed within NERC or EPSRC funded consortia. Provenance capture integrates with standards used by the W3C and workflow registries promoted by myExperiment.

Integrations and Supported Services

Taverna integrates with a wide range of domain resources and middleware: bioinformatics services like BioMart, Ensembl, UniProt; cheminformatics resources such as PubChem; geospatial services using OGC standards; and general web resources exposed via SOAP and REST. Connectors and plugins support R script execution, Python scripts, Perl, containerized tools via Docker, and message queuing through middleware projects used in e‑infrastructure. Service discovery can leverage registries such as UDDI and community hubs like myExperiment, while authentication and authorization can integrate with institutional identity providers and federations like eduGAIN.

Development History and Versions

Development traces to the myGrid project in the early 2000s, with initial releases emerging from collaborations involving the University of Manchester and the EMBL-EBI. Major version milestones delivered improvements in user experience, engine scalability and server capabilities; releases often coincided with collaborations in EU-funded projects such as e-BioGrid and GEN2PHEN. The project drew on academic groups experienced in workflow systems research, including teams associated with Manchester Informatics, CRG, and partner laboratories across Europe. Over time the codebase incorporated community contributions, shifted to modern dependency management and aligned provenance features with W3C recommendations.

Adoption and Use Cases

Taverna has been adopted by research groups in bioinformatics, cheminformatics, ecology and digital humanities for tasks including sequence analysis, annotation pipelines, data integration, and reproducible publishing of computational experiments. It has been used in consortiums and infrastructures like ELIXIR, integrated into teaching modules at universities such as University of Manchester and featured in workflows shared via myExperiment for reuse by communities addressing problems in genomics, proteomics and metabolomics. Several projects have combined Taverna with grid and cloud resources managed by platforms like Apache Mesos or institutional HPC centers to scale batch processing of large datasets.

Licensing and Community Ecosystem

Taverna is distributed under the GPL, enabling academic and industrial users to inspect, modify and redistribute the software under copyleft terms. Development has been supported by academic grants from agencies such as EPSRC and European programmes, with an open community contributing plugins, connectors and documentation. Community resources for sharing workflows and knowledge include repositories such as myExperiment and developer forums hosted by participating institutions like EMBL-EBI and University of Manchester; the ecosystem includes integrations with workflow platforms, teaching materials at universities, and examples drawn from projects in bioinformatics and allied domains.

Category:Workflow management systems Category:Bioinformatics software