I’d like to share a few highlights regarding Swift during last weeks
OpenStack Summit in Vancouver. From my experience it was the best summit
so far – for Swift as well as overall.
There were several companies that published their Swift cluster size, with Rackspace still leading with more than a hundred PB. Many more companies run Swift clusters in the range of several to dozens of petabytes (HP, Symantec, Softlayer, OVH, HudsonAlpha Biotech, Dreamworks to name only a few), as well as a lot of private clusters with mostly unknown size.
Design sessions / upcoming work
- Container sharding: storing millions of objects in one single container might slowdown operations, especially if container databases are not stored on SSDs. We continued discussing container sharding, which solves this problem and also improves operations (because replicated databases will be smaller in the future).
- Changing policies: with Erasure Coding landed and more backends coming into Swift it becomes important to change storage policies and migrate data between policies (for example to move cold data to less expensive policies like erasure coding or even tape in the future).
- On-Disk Encryption: Mostly IBM, HP and SwiftStack are working on this, and there were sessions with a few people from the Barbican team. There should be a production ready implementation available at the end of the Liberty cycle. Demand for this is mainly driven by companies that need to comply with regulations.
- Storlets: the idea is to develop a common framework to run code (not only Python) on the storage nodes to compute data. So, instead of moving data to compute nodes and back to Swift an operator can run computations directly using unused CPU cycles on the storage nodes. IBM has some POCs ready, that are able to run Java on the storage nodes.
- Tape: there is especially an interest from the scientific community (for example CERN, DKRZ) to make tape access easier. Using a public HTTP REST API is a good solution to this, and thus Swift might be the an answer – even if there are technical challenges today (for example, clients will timeout while waiting for tape mounts, avoiding many small operations and the like). IBM and BDT (tape drive vendor) show-cased their POCs, and there will be follow up work between different groups.
- Hummingbird: Rackspace rewrote the object server and replicator in Golang to increase performance drastically. They were experimenting with Pypy and other approaches in the past, but noticed that the performance gain with Python is somewhat limited. There is now a feature branch within Swift:
- Monitoring: I had a talk on monitoring Swift (using statsd and graphite), and there was a huge interest to collaborate and work on a common set of graphite dashboards within the community, especially by the companies that are already running Swift in production (Swiftstack, IBM, Symantec, HP). I’m looking forward to this, because we can share operation experience and create a universal dashboard that can be used immediately, simplifying and avoiding the same work every time. There will be some more improvements to the existing monitoring tools in Swift itself.
Swift itself became 5 years old during this summit – I’m looking forward
to the next 5 years!