For the past week or so I've been working on our Cassandra clusters at work where we've been using cassandra-snapshotter to push our backups to S3 on a scheduled basis. The only problem with it has been that we execute the process from cron, and the tool is exceptionally chatty, resulting in emails after every run.

We could suppress all the output of course, but that would then mean we wouldn't be aware if things went wrong other than when monitoring kicked in, and we wouldn't have a record of what went wrong anyway. We also couldn't just redirect the output to file as the snapshotter uses Fabric and does remote SSH stuff which doesn't respect the redirection. So we added a quick job to our sprint to manage this output better.

We also had another issue in that cassandra-snapshotter takes a list of the cassandra nodes in a cluster. This is great if you're in your own datacentre, however, if you're in AWS and using autoscaling groups to allow you reshape the cluster dyanimcally you have to remember to update the host list whenever this happens, so we also wanted to make it aware of of our hosts.

As we were on a busy sprint, it seemed most easiest to create a wrapper in the interim that could go around cassandra-snapshotter, and that would allow us to handle the output better, use a config file and also programmatically find the nodes in the cluster.

The result, which I've put on Github here was a wrapper that could call the snapshotter using a forked subprocess and then capture all the output and direct it using Python's logger module. The README is fairly self-explanatory, as is the config file in yaml format.

    snapshot:
        myproduct:
            aws_access_key_id: "XXXXXXXXXXXXXXXXXXXX"
            aws_secret_access_key: "XXXXXXXXXXXXXXXXXXX"
            s3_bucket_name: "mybucket"
            s3_bucket_region: "eu-west-1"
            s3_base_path: "mybackup"
            autoscale_group: "myautoscalegroup"

This means that from the command line and in cron it has a much cleaner usage as well.

    usage: run_snapshotter.py [-h] -p P

    Run Cassandra Snapshotter

    optional arguments:
      -h, --help  show this help message and exit
      -p P        Product as per config.yaml e.g. myproduct

At some point I will, hopefully, refactor the original and add this type of set-up into it then submit a pull request.



Comments

comments powered by Disqus