OpenStack Swift is an object storage solution built for massive scale with a focus on durability and availability of the stored data. The latest release, 2.0, is the biggest change in the history of the project since the open source release four years ago and adds a new feature called Storage Policies.
Until now all of the data within a Swift cluster is evenly distributed across a set of disks, nodes, and data centers. But most likely not all data has the same storage requirement: for example you might want to store some data on very fast SSDs only, other data in a different datacenter or country and yet other data using a different storage backend.
With the introduction of storage policies it is now possible to choose a more appropriate storage location depending on the user requirements. This includes:
- using different levels of replication depending on the required data durability and availability (maybe using only a single copy for content that can be recreated easily)
- using different storages, for example fast SSDs or slower spining disks based on performance and cost requirements
- storing data in specific datacenters or regions, which might be required by local laws
It also enables developers to add their own backends, and currently there are different solutions in the works, including erasure coding.
Upgrading
As always Swift tries to make it very easy to upgrade: from an operators point of view there is no action required when you upgrade your cluster and no need to modify the existing configuration or adding a storage policy to get it working. In fact all of the existing data is assigned a default policy without migrating data.
The only thing you need to keep in mind is that you should not downgrade your cluster once you created a second policy. Data stored within this policy won’t be accessible in a downgraded cluster because Swift doesn’t know where this data is stored.
Adding a new Storage Policy
The location of the data in a Swift cluster is defined by a hashed ring that includes all the disks and nodes. By default it is stored on all proxy and storage nodes in /etc/swift/object.ring.gz.
Each storage policy uses its own ring, thus you need to create a new ring first before adding a new storage policy. Create the new ring based on your requirements, for example using only nodes in a specific datacenter.
Let’s assume you have two data centers in different countries using a geo-replicated setup, storing at least one replica in the other datacenter. Now there might be some legal provisions that require you to store data only in a specific country.
To separate this data from the geo-replicated data you can create a new ring containing only storage nodes in a single datacenter, thus you end up with three rings:
- your existing default ring, containing all disks and nodes from all datacenters
- a new ring containing only nodes in the Paris datacenter
- a new ring containing only nodes in the Montreal datacenter
These new rings will be stored as /etc/swift/object-1.ring.gz and object-2.ring.gz on each node.
To make use of these new rings you need to add storage policies by adding them to /etc/swift/swift.conf:
[storage-policy:0] name = original default = yes [storage-policy:1] name = paris [storage-policy:2] name = montreal
Once you added the new storage policies users can select them when creating a new containers. This is done by using a special header ‚X-Storage-Policy‘.
swift -H „X-Storage-Policy: paris“ post for_paris_only
Now all objects that are stored in the container „for_paris_only“ will be placed only on nodes in ring 1. You can verify this by doing a “stat“ on the container:
swift stat for_paris_only Account: AUTH_test Container: for_paris_only […] X-Storage-Policy: paris
Please note that it is not possible to change the storage policy of existing containers.
Account information will report the used bytes and number of objects for each storage policy separate, but only after the container-updater run at least once.
Deprecating policies
Maybe you want to deprecate your existing default policy and use a different default policy. The following /etc/swift/swift.conf is an example how to deprecate your previous default policy and using a different policy as default:
[storage-policy:0] name = retired deprecated = yes [storage-policy:1] name = paris default = yes [storage-policy:2] name = montreal
Objects stored in containers with a deprecated policy are still accessible, but it is not possible to create new containers using a deprecated policy.
Further informations
There is a very extensive documentation on Storage Policies available within Swift:
http://docs.openstack.org/developer/swift/overview_policies.html
If you want to test policies your best bet is to use a Swift-All-In-One (SAIO) installation:
http://docs.openstack.org/developer/swift/policies_saio.html
Outlook
As of today storage policies enable operators to optimize their offering for their customers based on their needs, be it performance, specific storage locations or alternative replication levels.
But there is more coming: the community is currently working on a new backend adding an erasure encoded backend. Using erasure coding it is possible to lower operational costs by a huge margin. Intel published a short video explaining the upcoming erasure coding feature:
http://www.intel.com/content/www/us/en/storage/swift-with-erasure-coding-storage-demo.html
The basic idea behind erasure encoding is to add parity bytes to each object and split it into multiple chunks. For example, using a 10+4 encoding scheme an object of 10MB will be splitted into 14 chunks of 1MB each, and these chunks are stored across the cluster (with a single copy of each chunk). The erasure encoding is able to reconstruct the whole object using only 10 chunks, thus providing a better storage efficiency while maintaining a high level of data durability.