As we are working on our Software Factory product, one of the challenges we were facing was whether we could provide a scalable backend to Git. Traditionally, Git is installed using a file system as a backend, which means that out growing your file system means moderately complex operations to make it larger, which are manual and require some downtime. To solve this we started working on how do we use Swift as a backend for Git.
In this first blog post, we are going to explain the advantages of using Swift as backend instead of a regular file system. Then we will explain the operations that are done server side, when a Git client push or fetch objects. In a few days we will publish a second blog post where we will explain the solution we found and developed.
What could be the advantages of using an object storage system instead of a file system?
When storing Git repositories on a traditional file system you must take care of performing backups of your repositories regularly, for instance in case of hardware failures, to avoid losing your projects. What about when your repositories keep growing and you will run out of space? You will then need to perform some quite complex and risky operations to extend the storage capabilities for your repositories. These are some of the reasons (but certainly not all) why so many people prefer to use services like GitHub or Bitbucket.
The idea behind using Openstack Swift as backend for Git is to benefit from:
- its safety when storing data as it uses many replicas
- its capability of being extended easily with almost no storage capacity limit
- its high availability, no single point of failure
- its capability to allow operations teams to maintain the cluster without any downtime
Quick overview of Git (server side)
When you use Git client to push or fetch data to or from a Git repository, this is usually done via the smart protocol through HTTP or SSH. In both cases, two Git binaries are involved git-upload-pack and git-receive-pack. Those binaries are respectively in charge of creating a pack for the requested references to be sent back to the client and receive a pack from a Git client to store on the remote repository. Both are invoked transparently server side via your Git client.
What happen when you want to fetch references from a remote repository?
Your Git client will call git-fetch-pack that will transparently call the git-upload-pack command server side. Next the server (git-upload-pack) will send back to the client the list of the references and corresponding object’s sha-1 it has. The client will then advise the server of which references it needs as well as the references it already has locally. Server side git-upload-pack will be able to generate a custom pack file with only the objects the client does not have. The custom pack is sent back to the client.
What happen when you want to push local references to a remote repository?
Your client will call git-send-pack that will transparently call the git-receive-pack command server side. First, the server (git-receive-pack) will advertise the references it has. Then the client will send one or many lines of one of those three commands “create, delete or update” followed by the references old-sha-1-id new-sha-1-id and reference’s name. Next, the client will send a custom pack file with the objects the server does not have. Once the server has received and verified the pack it will update its references with the new sha-1. The C Git server will explode the received pack to a bunch of loose objects on the file system.
Those two commands (git-receive-pack and git-upload-pack) are the only ones involved during fetch/push operations server side. To summarize, from a file system point of view the operations are:
- Read and write references from/to the file system
- Read and write packed references from/to the file system
- Read and write loose object files from/to the file system
- Read and write pack files from/to the file system
To be able to store our repositories in Swift we need to replace the above file system operations by the corresponding Swift API calls. More information on the Git smart protocol? Please have a look here.
In a second blog post
We will do a quick introduction of Dulwich, the project we use to tackle our challenge and explain how we handle Swift as backend to store repositories thanks to it.
[…] Fabien Boucher: OpenStack Swift as backend for Git – Part […]
[…] fine folks at Enovance have written about the advantages of using Swift as backend for Git. In a recent blog post they gave also some details about what happens in Git (server side) when a client pushes or fetches […]