Sunday, August 14, 2011

Scheduling Regular Backups on Webfaction

For Leaguevine, we take backing up our information seriously and have deployed a custom backup system for our database. I figured I would share how we back up our databases here. Note that this sort of backup technique is certainly not a replacement for source control or backing up user uploads, but that's a topic for another day.

First, we have to decide on a schedule for what copies of the database we want to keep, and how long we want to keep them for. For Leaguevine, we decided on the following scheme:
  • Hourly backups for the past week
  • Daily backups for the past month
  • Monthly backups indefinitely
After defining a schedule, creating these backups on webfaction is as easy as creating some folders and specifying commands to periodically dump the database information into those folders. Thus, we have directories for each the hourly, daily, and monthly backups. To dump the data into files within these directories and write the success/failure of these dumps to a log, we create new cron jobs. For example, if we open up our list of cron jobs, we can add a single command to do all this. It might look something like:

30 * * * * /usr/local/pgsql/bin/pg_dump -Ft -U db_username db_name > /path-to-backups/leaguevine/hourly/leaguevine-hourly-`date +\%a\%H`.tar 2>> /path-to-backups/leaguevine/hourly/backups.log && echo "Database backup completed successfully on `date`" >> /path-to-backups/leaguevine/hourly/backups.log

There is a lot going on here. First, we notice that this job will run every hour on the half hour due to the way we defined the cron job.

Next, we specify the database name and username for the PostgreSQL database that Leaguevine uses. For this to run without needing to prompt the user for input, we need to set up the .pgpass file which just takes a second. 

The next thing this command does is specify the folder we want to dump the database to, along with the day of the week and the hour of the day. This works for us because the database dumps from previous weeks will get written over by files of the same name after a week passes. 

The command then writes any errors to a file called backups.log, but if there are no errors it writes a success message that has a date stamp on it to that file.

This single command will create all of the hourly backups for a week without needing any manual maintenance. However, this has the shortcoming of residing on the same server as the production database. Thus, if something happened and all of the data on that server were wiped out, all of the data would be lost. Thus, we need to additionally copy these backups to a different computer regularly. To do this, you can just install another cron job on a different machine as so:

40 * * * * . ~/.ssh-agent; scp -o PreferredAuthentications=publickey username@web160.webfaction.com:/path-to-backups/leaguevine/hourly/leaguevine-hourly-`date +\%a\%H`.tar /local-path-to-backups/leaguevine/hourly/ 2>> /local-path-to-backups/leaguevine/hourly/backups.log && echo "Database backup completed successfully on `date`" >> /local-path-to-backups/leaguevine/hourly/backups.log

This cron job is very similar to the previous one in some ways. First, it runs 10 minutes after every supposed backup should have happened. Next, it uses scp to grab the backup and store it on the local disk. And finally, it handles the logs the same way as before.

The trick here is to set up your .ssh-agent so that scp can run without needing a login. This isn't too hard, and webfaction has good docs for how to use ssh keys.

And that's it! With just a few lines in your cron files, you can have customized automated backups of your important data.