Globus Project — LLMpedia

Globus Project
Name	Globus Project
Founded	2001
Headquarters	University of Chicago

Contents

History
Architecture and Components
Services and Capabilities
Deployment and Use Cases
Governance and Development
Security and Compliance

Globus Project The Globus Project is a research and software initiative that develops middleware and managed services for data transfer, identity management, and distributed computation used across scientific, scholarly, and research infrastructures. It originated in academic collaborations and intersects with major laboratories, universities, funding agencies, and international research consortia to enable data-intensive workflows, high-performance computing, and collaborative science. The project’s software and services integrate with supercomputers, national research facilities, and cloud platforms to support reproducible science, large-scale simulations, and multi-institutional data sharing.

History

The project began in the early 2000s through collaborations among researchers at the University of Chicago, Argonne National Laboratory, and partners involved with the National Science Foundation and Department of Energy initiatives for cyberinfrastructure. Early milestones connected to grid computing efforts such as TeraGrid, Open Science Grid, and the European Grid Infrastructure influenced its design alongside contemporaneous projects like Globus Toolkit, Condor (software), and Univa Grid Engine. Funding and programmatic interactions included grants from the NSF and cooperative agreements with laboratory programs at Oak Ridge National Laboratory and Lawrence Berkeley National Laboratory. The project evolved through integration with workflows used in collaborations like the Large Hadron Collider experiments, the Human Genome Project follow-on consortia, and astronomy surveys coordinated with facilities such as the National Radio Astronomy Observatory and the European Southern Observatory.

Architecture and Components

The architecture comprises modular components that interoperate with compute and storage resources at institutions such as Los Alamos National Laboratory, Sandia National Laboratories, and campus clusters at institutions including Stanford University and Massachusetts Institute of Technology. Core components interface with identity federations like InCommon, protocol ecosystems including HTTPS, SSH, and GridFTP, and storage systems exemplified by Lustre (file system), GPFS, and object platforms similar to services offered by Amazon Web Services, Google Cloud Platform, and Microsoft Azure. Integration adapters connect to workload managers like Slurm (software), PBS Professional, and Torque (software), while cataloging and metadata components reference standards developed by organizations such as W3C and Open Geospatial Consortium. Monitoring and telemetry layers use patterns common to projects like Nagios and Prometheus, and client libraries support languages used in scientific computing such as Python (programming language), Java (programming language), and C++.

Services and Capabilities

Services include reliable high-performance file transfer, identity and access management, sharing and collaboration tools, and automation primitives for managing endpoints at facilities such as CERN, Fermilab, and Brookhaven National Laboratory. Capabilities map to data movement workflows used in collaborations like IceCube Neutrino Observatory, long-tail science projects hosted by Dryad (repository), and large-scale imaging pipelines developed at institutions such as NIH-funded centers and the Broad Institute. The project exposes APIs and web interfaces compatible with science gateways similar to Galaxy (computational biology), workflow systems like Airflow and Nextflow, and data portals modeled after Dataverse. It supports authentication patterns leveraging OAuth 2.0, SAML 2.0, and federated identity used by entities such as European Research Council grantees and national labs in coordinated research initiatives.

Deployment and Use Cases

Deployments occur at supercomputing centers including National Energy Research Scientific Computing Center, computational science facilities at Argonne Leadership Computing Facility, and campus clusters across the Association of American Universities. Use cases encompass large-scale simulation campaign management for climate modeling groups participating in Coupled Model Intercomparison Project, genomics pipelines in consortia like ENCODE, particle physics data distribution for collaborations associated with ATLAS experiment and CMS experiment, and multi-institutional imaging studies coordinated with centers like The Cancer Genome Atlas. The project supports data stewardship practices aligned with agencies such as National Institutes of Health and adheres to data management planning expected by funders like Horizon Europe. Commercial and private-sector integrations include partnerships with research divisions of firms such as IBM, Microsoft, and Amazon for hybrid cloud workflows.

Governance and Development

Governance includes stewardship by academic institutions, collaborations with national laboratories, and participation in standards bodies including IEEE and IETF working groups related to data movement and identity. Development follows open-source and research-software practices similar to projects hosted on platforms like GitHub and coordinated via issue tracking, continuous integration, and code review workflows familiar to contributors from Carnegie Mellon University and other university labs. Community engagement channels mirror those used by consortia such as Research Data Alliance and training partnerships with organizations like XSEDE and ESnet for outreach, workshops, and developer sprints. Licensing and contribution policies reflect models used by projects such as Apache Software Foundation and collaborations with institutional technology transfer offices.

Security and Compliance

Security practices align with risk frameworks and compliance regimes encountered at facilities like Department of Energy National Laboratories and agencies such as National Institute of Standards and Technology. Measures include encrypted transport layers, federated authentication consistent with Federal Risk and Authorization Management Program, access controls compatible with mandates from Health and Human Services for protected health information scenarios, and audit logging modeled after controls used by CERT Coordination Center. The project integrates with institutional identity providers and aligns with policies similar to those from InCommon and regional research access federations to support compliance with data-use agreements, export controls, and institutional review processes overseen by offices like Institutional Review Board.

Category:Research software projects