LLMpediaThe first transparent, open encyclopedia generated by LLMs

Kepler (software)

Generated by GPT-5-mini
Note: This article was automatically generated by a large language model (LLM) from purely parametric knowledge (no retrieval). It may contain inaccuracies or hallucinations. This encyclopedia is part of a research project currently under review.
Article Genealogy
Parent: XSEDE Hop 4
Expansion Funnel Raw 85 → Dedup 0 → NER 0 → Enqueued 0
1. Extracted85
2. After dedup0 (None)
3. After NER0 ()
4. Enqueued0 ()
Kepler (software)
NameKepler
DeveloperKepler Project
Released2005
Programming languageJava
Operating systemWindows, macOS, Linux
PlatformJava Virtual Machine
LicenseBSD

Kepler (software) is an open-source scientific workflow system for designing, executing, and sharing computational pipelines integrating data, models, and tools across disciplines. It provides a visual environment for composing workflows that orchestrate components from diverse repositories, enabling reproducible research and collaborative science across institutional and disciplinary boundaries.

Overview

Kepler originated as a community-driven project to support researchers requiring complex data integration, model coupling, and provenance tracking across heterogeneous computing environments. It builds on earlier workflow efforts such as Taverna, Kepler Project collaborators, and concepts from Pegasus (software), while interoperating with resources like HPC centers, National Science Foundation, and domain-specific infrastructures such as EarthScope, CIPRES Science Gateway, and iPlant Collaborative. The platform targets users in domains including astronomy, earth science, ecology, bioinformatics, and climate science, offering connectors to services from NASA, NOAA, USGS, and research infrastructures like XSEDE.

Features and Architecture

Kepler implements a modular actor-oriented architecture derived from the Ptolemy II framework developed at University of California, Berkeley and integrates patterns from projects such as OpenMI and OGC. Core features include a graphical workflow editor, a provenance capture system compatible with W3C PROV, and a plugin mechanism for external components like R (programming language), Python, MATLAB, and NetCDF libraries. The runtime supports execution on local workstations, distributed clusters managed by SLURM or PBS, and cloud platforms including Amazon Web Services and Google Cloud Platform via adapters. Kepler’s component model encapsulates domain-specific actors for sources such as MODIS, Landsat, GenBank, and tools like BLAST, ArcGIS, and GDAL, enabling end-to-end pipelines that integrate observational datasets, simulation models, and statistical analysis from packages like RStudio and Jupyter Notebook. The system’s provenance viewer links to citation and metadata standards such as Dublin Core and ISO 19115 for geospatial metadata.

History and Development

The Kepler project began in the early 2000s with funding and collaboration among institutions including University of California, Irvine, Lawrence Berkeley National Laboratory, University of Illinois Urbana-Champaign, and agencies like the National Science Foundation and Department of Energy. Influences included the Ptolemy II system at UC Berkeley and workflow engines like Taverna from European Bioinformatics Institute and the myGrid project at University of Manchester. Early releases targeted workflow reproducibility and model integration for initiatives such as LTER and NEON and partnered with centers like SDSC and NCSA. Over successive versions, contributors from institutions such as University of New Mexico, UCLA, and UC San Diego added features for provenance, distributed execution, and domain-specific actor libraries. Governance transitioned through community steering groups, with code contributions tracked via platforms inspired by SourceForge and later GitHub-style workflows. The project received recognition in workshops held at conferences including AGU Fall Meeting, Gordon Research Conferences, and ACM SIGMOD meetings.

Use Cases and Applications

Researchers have applied Kepler in workflows for coupling atmospheric models from NOAA with hydrology models from USGS to study flood risk, and for integrating MODIS satellite products with ecosystem models used by NEON and LTER scientists. In bioinformatics, pipelines orchestrating GenBank retrieval, sequence alignment with BLAST, and downstream analysis in Bioconductor have been implemented. Climate scientists have used Kepler to chain models such as WRF and CESM while capturing provenance for intercomparison projects associated with CMIP. Astronomers have integrated data services from Sloan Digital Sky Survey and NASA Exoplanet Archive into reproducible analysis workflows. Ecologists and conservation groups leverage Kepler to combine sensor networks like NEON and LTER with statistical models in R to generate reports for stakeholders including NOAA and US Fish and Wildlife Service. Kepler has also been used in education initiatives at institutions such as California State University and University of Arizona to teach computational thinking and reproducible research practices.

Reception and Impact

Kepler has been cited in literature across journals such as Science, Nature Communications, Journal of Geophysical Research, and Bioinformatics for enabling reproducible computational experiments and facilitating interdisciplinary collaboration. Evaluations at conferences like ESIP Federation meetings and IEEE e-Science workshops have highlighted strengths in provenance capture and extensibility, while noting competition from systems like Galaxy (computational biology), Apache Airflow, and CWL-based tools. Its impact includes influencing standards for workflow provenance and encouraging institutions such as NCAR and NASA to adopt reproducible workflow practices. Kepler’s community-driven development model informed other open-source initiatives at labs such as LANL and ORNL, and its actor libraries contributed to domain gateway projects funded by agencies like the National Institutes of Health and DOE Office of Science.

Category:Scientific workflow systems