We’ve been faced to a challenge to migrate and synchronize two swift clusters in order to provide a customer a way to handle a swift migration easily.
For that we have created a project called swiftsync hosted in github: https://github.com/enovance/swiftsync
The swiftsync project provides two binaries called :
- swsync (The synchronizer)
- swfiller (A swift filler)
swfiller is a tool designed to ease you the filling process of a swift cluster. This tool can be useful for testing the swsync capabilities on a test platform before using it on a production platform.
The second tool called swsync is the synchronization tool. It will migrate all swift content from an origin server to a destination server using only the API of the both proxy servers. Content that will be synchronized is the following :
- account and account metadata
- container and container metadata
- object and object metadata
swsync will take care of data (accounts metadata, containers/objects, containers/objects metadata) that are already stored on destination cluster in order to speed up and optimize data copying. Indeed only data that is more recent on origin will be synchronized. The first run of swsync will migrate all data from origin to destination but next runs will only migrate modified data. The process is as follow :
- will synchronize account metadata if they has changed on origin
- will delete container on destination if no longer exists on origin
- will create container on destination if not exists
- will synchronize destination container metadata if not same as origin container.
- will remove container object if no longer exists in origin container
- will synchronize object and metadata object if the last-modified header is the latest on the origin.
swsync has been designed to be run and run again and not ensuring that the first pass goes well, if for example there is a network failure swsync will just skip it and hope to do it on the next run. So the tool can for instance be launched by a cron job to perform a diff synchronization each night.
swsync will need to use a user that own the ResellerAdmin role. This role will let the user perform all kind of API operations on all swift account so swsync will be able to explore all origin and destination accounts to evaluate which data it need to synchronize.
Both tools come with unit and functional tests. Unit tests are managed by tox as all openstack projects.
Important thing to mention is that the synchronization tool is currently a work in progress and has not been really tested on a cluster that own a huge amount of data.
Have a look to the project README file for further informations about the project.
Your object sync source code looks like it will quietly skip syncing anything more than the first 10,000 objects in a container: Where your code calls swiftclient’s get_container() (defined in swiftclient/client.py) to list all objects, it will only get the first 10,000 since the optional parameter full_listing defaults to false. Even if full_listing was set true, with millions of objects you could hit memory limitations.
Hello Jim,
Thanks you for your interest for Swiftsync but yes you are right it skips the other objects. As the project had no bug tracker I’ve just set up the project on launchpad. Where you can find the issue you reported, and the patch I’ve submitted on Openstack Gerrit. We can continue the discussion about this issue on lauchpad or even on Gerrit.
Cheers,
Fabien