Generated by GPT-5-mini| ARC (software) | |
|---|---|
| Name | ARC |
| Developer | Hyperion Solutions; Broad Institute; OpenAI |
| Released | 2008 |
| Latest release version | 3.2 |
| Programming language | C++, Python, JavaScript |
| Operating system | Windows, macOS, Linux |
| License | Proprietary, open-source components |
ARC (software) is a modular application platform designed for automating data integration, analysis, and workflow orchestration across bioinformatics, enterprise analytics, and scientific computing environments. It combines scheduling, plugin extensibility, and a graphical interface to connect heterogeneous tools and datasets from projects associated with institutions such as the Broad Institute, Massachusetts Institute of Technology, and corporations like Microsoft Corporation and IBM. ARC has been used alongside packages from Apache Software Foundation, NumPy, and TensorFlow in pipelines that span clinical research, genomics, and cloud computing.
ARC emerged in the late 2000s amid a proliferation of pipeline managers and workflow systems influenced by initiatives at Lawrence Berkeley National Laboratory and the European Bioinformatics Institute. Early work drew on concepts from the Human Genome Project era and collaborations involving contributors at the Broad Institute and Howard Hughes Medical Institute. Subsequent development saw integrations with cloud services offered by Amazon Web Services, Google Cloud Platform, and enterprise offerings from Microsoft Azure. Community-led forks and extensions paralleled movements around Apache Airflow and Nextflow as reproducible research and continuous integration practices matured.
ARC provides a visual workflow designer, job scheduler, and plugin system that enable orchestration of tasks invoking tools such as BLAST, BWA, GATK, and custom scripts written for Python (programming language), R (programming language), or Java (programming language). It supports data provenance tracking compatible with standards promoted by FAIR principles proponents and integrates authentication with identity providers like OAuth and LDAP. Monitoring integrates with telemetry stacks from Prometheus and Grafana while artifact storage interoperates with object stores from Amazon S3 and Google Cloud Storage. Enterprise features include role-based access control used by organizations such as National Institutes of Health and European Molecular Biology Laboratory.
ARC's architecture typically separates a controller, worker agents, and a persistence layer, a pattern reminiscent of distributed systems pioneered at Google and discussed in literature from ACM conferences. The controller implements a directed acyclic graph (DAG) executor influenced by designs in Apache Airflow and academic descriptions from Stanford University research groups. Worker agents run on nodes managed by orchestration engines including Kubernetes or resource managers like Slurm Workload Manager. Persistent metadata is stored in relational engines such as PostgreSQL or MySQL, while object artifacts are kept in storage systems used by Dropbox and Box, Inc.-style services. Security models reference guidelines from National Institute of Standards and Technology and compliance regimes followed by Food and Drug Administration-regulated labs.
Development of ARC has involved contributors from academic labs, commercial vendors, and open-source projects associated with foundations like the Apache Software Foundation and Linux Foundation. Mailing lists, issue trackers, and continuous integration pipelines have been hosted on platforms similar to GitHub and GitLab. Conferences and workshops where ARC implementations have been presented include Bioinformatics Open Days, sessions at EMBL-EBI meetings, and tracks at NeurIPS and ISMB where workflow reproducibility is discussed. User communities include bioinformatics cores at institutions such as Wellcome Trust Sanger Institute and data engineering teams at companies like Intel Corporation and NVIDIA.
ARC has been adopted for high-throughput sequencing pipelines at facilities such as Broad Institute cores, clinical genomics services connected to Centers for Disease Control and Prevention, and environmental sequencing studies coordinated by groups like United Nations Environment Programme. Other use cases include machine learning model training orchestration for teams at OpenAI and financial analytics prototypes in firms associated with Goldman Sachs and JPMorgan Chase. Integrations with laboratory information management systems used by Thermo Fisher Scientific and Illumina illustrate cross-vendor workflows, while collaborations with cloud providers facilitate scalable deployments for projects with the European Space Agency and national supercomputing centers.
ARC distributions have historically included a mixture of proprietary components, community-maintained open-source modules, and permissively licensed connectors inspired by licenses from projects such as Apache License and MIT License. Commercial editions offered enterprise support, subscription services, and certified integrations with systems from Red Hat and Oracle Corporation. Community editions and third-party forks have been accessible via repositories on platforms resembling GitHub under open-source terms, while commercially packaged installers have been distributed through channels used by Microsoft Store and enterprise procurement at organizations like Siemens.
Category:Workflow management systems