Congratulations You Have Built a Globus
We read with amusement “Dear friend, you have built a Kubernetes” and “Dear Sir, You have built a compiler”, two tongue-in-cheek commentaries on the tendency to build quick-and-dirty scripts because the best-of-breed solution is “overkill”. We said, “someone should write one of those for Globus”, but were of course too busy/lazy to do so. But then we realized that we could ask an AI. Prompting ChatGPT o1 with “Please write a version of the following that is about Globus rather than Kubernetes {followed by text at URL}” produced the following text, which is pretty much spot on.
Dear friend,
I’m afraid to inform you that you have built a Globus. I know you only wanted a “simple system” to transfer data between a couple HPC clusters and your laptop. You said, “Globus is overkill” and “we can handle this with basic SSH and some scripts,” and yet, six months later, you have a pile of Python, Bash, and half-baked APIs that break every time someone tries to move a file larger than a few gigabytes—or tries to authenticate with something besides a shared SSH key. Surely, switching to \texttt{rsync} would have solved everything; at least that way, you’d be relying on a standard file synchronization tool. But hold on, you realize \texttt{rsync} alone doesn’t solve identity and credential management, or let you queue large transfers in the background. “Do I really need a separate mechanism for user authentication, server endpoints, concurrency control, error handling, and encryption?” you ask yourself. “Surely not; our data flows are so simple—just HPC cluster A, HPC cluster B, a staging server, and a handful of users.” And so you march on, adding yet another set of scripts for user tokens, partial transfers, error retries, certain that this will be the last patch job you need to maintain this tangle of one-off solutions.
But wait—eventually, you realize you need to support external collaborators at other institutions. Now you have to deal with multiple ID providers, each with its own password policies and group membership tools. Tired, you spin up a homegrown identity bridging service and keep track of “trust relationships” in a JSON file that nobody dares edit. One of your team members suggests connecting everything via a single OIDC server. After that, you’re sure the authentication complexity will be gone forever.
Except if you quit or go on vacation, who’s going to maintain this labyrinth of scripts, credential wrappers, and half-documented endpoint definitions? The fragile multiplexing you set up over \texttt{ssh} tunnels? Who will remember the ephemeral ephemeral “temp space” mounting procedures that only you seemed to understand? So you think, “Let’s keep it all under control with Ansible or Terraform, so we can treat our HPC endpoints as version-controlled, ready-to-spawn clones.” Certainly, you say, this is going to be simpler and easier to maintain than just using Globus. What glorious engineering, indeed. In the final stretch of your journey to avoid building a Globus, your manager tells you that your transfer service must also handle user file sharing and direct HTTPS links for collaborators who refuse to learn \texttt{scp}. Handling sharing via ephemeral tokens requires you to design a brand-new sign-and-verify scheme. “Not my problem,” says your manager. So you write a separate token service that checks each user’s privileges and logs every operation. Done at last, you say to yourself—without having to build a Globus. A user-friendly interface, robust transfer engine, identity federation, multi-endpoint concurrency, ephemeral tokens, an API to manage shares. Dear friend, you have built a Globus.
Addressed to: Those who wanted to avoid Globus.
P.S. I don’t mean to imply that you can never roll your own data transfer and sharing system that suits your needs better than Globus, nor that Globus itself is without any complexity. I just want to gently caution you, dear friend, to consider why Globus exists—and the pains it solves—before embarking on the joyfully maddening quest of building it yourself.
We would like to thank everyone whose enthusiasm for Globus helped train the AI!