Generated by GPT-5-mini| Galaxy (software) | |
|---|---|
| Name | Galaxy |
| Developer | Galaxy Project |
| Released | 2005 |
| Latest release | 23.0 (example) |
| Programming language | Python, JavaScript |
| Operating system | Linux, macOS |
| Genre | Scientific workflow, Bioinformatics |
| License | Academic Free License |
Galaxy (software) Galaxy is an open, web-based platform for accessible, reproducible, and transparent computational research, primarily used in bioinformatics, genomics, and computational biology. The project integrates analysis tools, workflow management, data provenance, and collaboration capabilities to support users from laboratory scientists to computational researchers and educators across institutions such as University of Pennsylvania, Penn State University, University of Freiburg, and international consortia.
Galaxy provides a graphical user interface and an API that allow users to run command-line tools and compose workflows without direct interaction with Unix shells or high-performance computing schedulers. The platform emphasizes reproducibility through automatic tracking of tool parameters, dataset versions, and provenance metadata, aligning with initiatives like the FAIR data principles and programs supported by funders such as the National Institutes of Health, European Molecular Biology Laboratory, and Wellcome Trust. Galaxy instances range from single-server deployments to federated infrastructures used by projects at European Bioinformatics Institute, Australian National University, and national research infrastructures.
Galaxy originated in 2005 as a collaboration between researchers at Pennsylvania State University and the University of Pennsylvania to make bioinformatics analyses more accessible to bench scientists. Early development intersected with projects at the Broad Institute and drew on concepts from workflow systems such as Taverna and Kepler while adopting web frameworks influenced by Django and Ruby on Rails communities. Over time, the project incorporated contributions from institutions including the European Galaxy Team, University of Freiburg, Johns Hopkins University, and independent contributors coordinated through events like Bioinformatics Open Source Conference and Galaxy Community Conference. Milestones include integration of container technologies alongside collaborations with the Docker and Kubernetes ecosystems and alignment with metadata standards from Global Alliance for Genomics and Health.
Galaxy's architecture separates the web interface, job execution, and data storage, enabling modular integration with external services such as PostgreSQL, Apache HTTP Server, and cluster managers like SLURM and PBS Pro. The core is written in Python with an interactive front end that leverages JavaScript libraries and visualization tools from projects like Jupyter and D3.js for interactive plots. Key features include:
- Tool integration and wrappers that connect to command-line packages from repositories such as Bioconductor, HTSeq, SAMtools, Bowtie2, and BWA. - Workflow editor supporting graphical composition, versioning, and sharing, interoperating with standards like Common Workflow Language and export to formats used by Nextflow and Snakemake. - History and provenance tracking that records parameter settings, tool versions, and dataset lineage for compliance with policies from agencies like National Science Foundation and repositories including Zenodo and NCBI Sequence Read Archive. - User management, reproducible publishing, and secure access controls compatible with identity providers such as ORCID and CILogon.
Galaxy is applied in diverse projects spanning clinical research, environmental genomics, and education. Examples include RNA-seq pipelines used in studies from teams at Harvard Medical School and Stanford University, metagenomics analyses performed by researchers at Woods Hole Oceanographic Institution and Max Planck Society, and epigenomics workflows deployed in consortia like ENCODE and The Cancer Genome Atlas. In classroom settings, faculty at Carnegie Mellon University and University of California, Berkeley use Galaxy for hands-on training linked to curricula from organizations like GOBLET and ELIXIR. Galaxy also supports translational workflows integrated with clinical consortia such as Global Alliance for Genomics and Health initiatives.
Deployments vary from single-host virtual machines to cloud-native federated infrastructures using providers like Amazon Web Services, Google Cloud Platform, and Microsoft Azure. Scalability is achieved by connecting Galaxy to container orchestration via Kubernetes, ephemeral execution with Docker containers, and job routing to compute backends managed by HTCondor or SLURM. National-scale deployments have been implemented by organizations including ELIXIR, de.NBI (German Network for Bioinformatics Infrastructure), and national research clouds, enabling integration with data repositories such as European Nucleotide Archive and identity federations used by eduGAIN.
The Galaxy Project is stewarded by a community-driven governance model with leadership and working groups comprising contributors from universities, research institutes, and companies such as Dataverse partners and commercial service providers. Development is coordinated through public repositories, mailing lists, and events like the Galaxy Community Conference and Bioinformatics Open Source Conference. Licensing is permissive, historically using the Academic Free License to enable academic and commercial use while encouraging open contribution. Governance intersects with standards bodies including Global Alliance for Genomics and Health and infrastructure initiatives like ELIXIR and National Institutes of Health data science programs.
Category:Open-source bioinformatics software