Building Research Infrastructure at Scale

Inside Globus's Hybrid SaaS Architecture

January 26, 2026  | CODERLEGION
By Tom Smith

Research computing faces a unique architectural challenge: how do you build a platform that works across institutional boundaries, handles petabyte-scale data movement, supports diverse storage systems, and maintains fine-grained security controls, all without requiring users to be infrastructure experts?

Globus, a non-profit service from the University of Chicago, has spent nearly 30 years solving this problem. What started as the Globus Toolkit for grid computing has evolved into a comprehensive platform-as-a-service that now handles 2 petabytes of data transfer daily across 2,600+ institutions in 80+ countries.

The Hybrid Architecture

The key architectural insight behind Globus is its hybrid model. “We can’t do that because all the institutions own their own resources,” explains Rachana Ananthakrishnan, Executive Director of Globus. “We can’t sort of take over those resources. We have to figure out how to add valuable services as a layer to that resource.”

This means Globus doesn’t centralize data—instead, it orchestrates movement between endpoints while data flows directly over high-speed research networks (often 100-400 Gbps). The architecture consists of three main components:

Local Agents run at institutional sites. The Globus Connect agent handles data operations with plugins for different storage types (POSIX, object stores, tape archives). The Globus Compute agent manages remote function execution on various schedulers (Slurm, PBS, Kubernetes). Action Providers allow integration with custom APIs.

Global Management Services runs as a multi-tenant SaaS. These services handle orchestration, heuristic-based optimization, reliability guarantees, and maintain the security fabric. But crucially, they maintain only control-plane connections—file data moves directly between endpoints.

Security Fabric uses OAuth and OIDC to federate authentication across 2,140+ identity providers. This allows researchers to use institutional credentials without creating new accounts, while maintaining fine-grained authorization across distributed resources.

Platform Capabilities

Globus offers six core capabilities that can be used independently or composed:

Managed Transfer provides fire-and-forget data movement with automatic retry, checksum verification, and progress tracking. “The protocol here continuously sends what are called markers,” Ananthakrishnan explains. “So it says I’ve moved 5 bytes, 10 bytes, 20 bytes, and suddenly there’s no marker. Then this keeps retrying with a backoff algorithm.”

Data Sharing creates permission overlays on existing storage without staging data. Share with any identity, set time restrictions, and maintain audit logs, all without creating local accounts for collaborators.

Unified Access abstracts heterogeneous storage through an open connector ecosystem. The Community Connector Program allows vendors and users to build plugins for any storage system, while Globus handles authentication, translation, and rate limiting.

Search provides schema-agnostic metadata indexing with dynamic schema detection and fine-grained visibility controls. Think Elasticsearch optimized for research with built-in access control.

Compute brings function-as-a-service to on-prem HPC. Submit Python functions that run on remote systems with federated authentication. One COVID-era drug screening project used this to distribute ML workloads across diverse resources, completing training cycles in 31 seconds, compared with much longer traditional approaches.

Flows offers declarative workflow automation using JSON-based definitions. Event-driven execution with AWS Step Functions under the hood, but extended to securely call outside services and remote systems.

Real-World Performance

The Advanced Photon Source at Argonne National Laboratory demonstrates the platform’s capabilities at scale. Multiple beamlines use Globus to automate data movement and processing. “These data services have taken the time to solve a structure from weeks to days and now to hours,” notes beamline scientist Darren Sherrell.

For developers, Globus provides comprehensive REST APIs and SDKs. The platform is extensible—you’re not locked into the web interface. The Franklin & Marshall College biology department even built custom flows that allow field volunteers to click a button, fill a questionnaire, and trigger automated processing. “It really increased the community participation,” Ananthakrishnan notes. “In place of teaching all his volunteers how to do these 10 things, he defined a flow.”

Compliance and Security

For protected data, Globus implements NIST 800-53 and 800-171 controls, signs BAAs for HIPAA, and complies with GDPR and data protection agreements. The architecture supports both personal data (visible only to the user) and shared data (visible across institutions) with appropriate access controls.

The security model is shared: Globus controls the platform layer, while institutions control their endpoints. “It’s a hybrid SaaS, it’s a shared security model,” Ananthakrishnan explains. “We do some controls, they have to do some controls.”

Getting Started

Globus offers a freemium model, basic features are free for non-profit research. Developers can start with the web interface or dive directly into the APIs. Documentation is available at docs.globus.org, and the platform includes a 90-day trial for qualified projects.

For architects building research platforms, Globus provides a reference architecture for handling distributed data at scale. The hybrid model, federated authentication, and connector ecosystem offer lessons applicable beyond scientific computing to any scenario requiring secure data movement across organizational boundaries.