Generated by GPT-5-mini| Globus Online | |
|---|---|
| Name | Globus Online |
| Developer | University of Chicago; Argonne National Laboratory; Globus (company) |
| Released | 2012 |
| Programming language | C, Python, Java |
| Operating system | Linux, Microsoft Windows, macOS |
| License | Proprietary, service-based |
Globus Online
Globus Online is a managed data transfer and synchronization service designed for large-scale research data movement across distributed computing environments. It was developed to integrate high-performance data transfer tools with identity and access management systems used by Fermilab, Lawrence Berkeley National Laboratory, Oak Ridge National Laboratory, and other research institutions. The service connects research projects, national laboratories, and scientific collaborations like CERN, LIGO Scientific Collaboration, Human Genome Project and NOAA for reliable transfer of petascale datasets.
Globus Online originated from research efforts at Argonne National Laboratory and the University of Chicago to simplify use of the Grid computing ecosystem and the Open Science Grid. Early prototypes were influenced by the Globus Toolkit and collaborations with National Science Foundation funding programs and initiatives such as the TeraGrid project. Commercialization led to the formation of Globus (company) which offered the managed service used by projects including Large Hadron Collider experiments, Square Kilometre Array pathfinders, and bioinformatics consortia like the 1000 Genomes Project. Milestones include integration with identity providers from InCommon, deployment on infrastructures operated by XSEDE, and adoption by regional consortia such as ESnet members.
The service architecture combines transfer agents, control servers, catalogs, and identity federation components. Core components include the transfer daemon (built on libraries from the GridFTP protocol lineage), the web-based transfer manager, and command-line interfaces supporting automation with tools like Apache Airflow, Globus SDK, and workflow engines used by Galaxy (informatics platform) and HTCondor. Authentication and authorization use federated identity with Shibboleth, OAuth 2.0 integrations, and attribute services provided by InCommon or institutional identity providers at universities such as Stanford University and Massachusetts Institute of Technology. Storage endpoints often run on systems provided by IBM Spectrum Scale installations, Panasas arrays, or Ceph deployments managed at facilities like NERSC.
Services include point-to-point high-performance transfers, third-party transfers between endpoints, checksum verification, partial-file transfer, and directory synchronization. User-facing features encompass a web UI, REST APIs, command-line clients, and SDKs for integration with platforms like Jupyter Notebook, Zenodo, and Globus Nexus identity linking. Administrative features support group management using concepts from COmanage and project-based access modeled after GitHub organization structures used by collaborative science teams. Interoperability extends to storage protocols and middleware such as NFS, S3, iRODS, and integration adapters for computing resources at Argonne Leadership Computing Facility.
Adoption spans government laboratories, universities, and international collaborations: notable adopters include Lawrence Livermore National Laboratory, Brookhaven National Laboratory, European Organization for Nuclear Research, and consortia like ELIXIR. Use cases cover data replication for astronomy surveys like those from Pan-STARRS, genomic pipelines in projects aligned with European Bioinformatics Institute, and climate model distribution linked to NOAA National Centers for Environmental Information. Workflows often couple transfers with compute submissions to schedulers such as Slurm Workload Manager and PBS Professional in environments at centers like Oak Ridge Leadership Computing Facility.
Designed for high-throughput transfers, the system leverages parallel TCP streams, tuning parameters from the PerfSONAR community, and network engineering practices championed by ESnet to achieve multi-gigabit and terabit-scale transfers across research networks. Benchmarks reported in white papers compare performance against native rsync and basic scp transfers, demonstrating orders-of-magnitude improvements for wide-area transfers between facilities such as Brookhaven National Laboratory and NERSC. Scalability is attained through distributed transfer agents, load-balanced control endpoints, and regional deployment models similar to those used by Content Delivery Network operators serving scientific data.
Security architecture depends on federated authentication, role-based access control, encrypted transfers using TLS, and integrity checks via checksums and cryptographic hashes. Compliance considerations address institutional policies, export controls, and data governance frameworks relevant to health data and human subjects research coordinated with NIH policies and institutional review boards at universities like University of California, Berkeley. Enterprise features include audit logging, integration with identity providers such as Microsoft Azure Active Directory, and support for contractual data-use agreements used by collaborations with NASA and national laboratories.
Category:Data transfer software Category:Research infrastructure Category:Scientific computing