Posted on

Backing Up A Linux Directory To The Cloud

We use Amazon S3 to backup a myriad of directories and data dumps from our local development and public live servers.  The storage is cheap, easily accessible, and is in a remote third party location with decent resilience.  The storage is secure unless you share your bucket information and key files with a third party.

In this article we explore the task of backing up a Linux directory via the command line to an S3 bucket.   This article assumes you’ve signed up for Amazon Web Services (AWS) and have S3 capabilities enabled on your account.  That can all be done via the simple web interface at Amazon.

Step 1 : Get s3tools Installed

The easiest way to interface with Amazon from the command line is to install the open source s3tools application toolkit from the web.  You can get the toolkit from http://www.s3tools.org/.  If you are on a Redhat based distribution you can create the yum repo file and simply to a yum install.  For all other distributions you’ll need to fetch and build from source (actually running python setup.py install) after you download.

Once you have s3cmd installed you will need to configure it.  Run the following command (not you will need your access key and secret key from your Amazon AWS account):
s3cmd --configure

Step 2 : Create A Simple Backup Script

Go to the directory you wish to backup and create the following script named backthisup.sh:

#!/bin/sh
SITENAME='mysite'
# Create a tarzip of the directory
echo 'Making tarzip of this directory...'
tar cvz --exclude backup.tgz -f backup.tgz ./*
# Make the s3 bucket (ignored if already there)
echo 'Create bucket if it is not there...'
s3cmd mb s3://backup.$SITENAME
# Put that tarzip we just made on s3
echo 'Storing files on s3...'
s3cmd put backup.tgz s3://backup.$SITENAME

Note that this is a simple backup script.  It tarzips the current directory and then pushes it to the s3 bucket.  This is good for a quick backup but not the best solution for ongoing repeated backups.  The reason is that most of the time you will want to perform a differential backup, only putting the stuff that is changed or newly created into the s3 bucket. AWS charges you for every put and get operation and for bandwidth.  Granted the fees are low, but every penny counts.

Next Steps : Differential Backups

If you don’t want to always push all your files to the server every time you run the script you can do a differential backup.   This is easily accomplished with S3Tools by using the sync instead of the push command.   We leave that to a future article.