Backups are great!

On this page you can find a brief overview of the system I've put together to automate backups of my and my girlfriend's mac laptops. I've got it running on an old Xbox running Linux, but any *nix (including OS X) would probably do.

The system's main features are:

  • The critical stuff is fully automated: nothing to forget!
  • The system efficiently creates daily snapshots, so we can see what our files looked like yesterday, last week or (once it's been running long enough) last year.
  • User-friendly: we can easily access our data using the Finder's "connect to server" feature.
  • The system automatically collects all music from our iTunes Libraries and re-exports it (and the rest of my mp3 collection) as a single shared library, visible from iTunes.
  • It all works just fine over our wireless network.

How it works

The heart of the system is Dirvish, which is in turn built on rsync, which comes preinstalled on OS X and most Linux distributions.

Dirvish uses rsync to create and maintain the snapshots. Due to some clever use of hard links, each snapshot uses very little disk space, except for files which are new or have changed since the last backup. I use a similar trick when building the combined music collection, by hard-linking the files I can have "copies" of the music wherever needed, without wasting any disk space.

I use Samba to re-export the backed up files and make them available, and mt-daapd to share the collected music with iTunes.

As I'm running Debian on the Linux machine, in most cases all I had to do was run "apt-get install PROGRAM" to install the software. A bunch of configuration files and shell scripts hold everything together.

Configuring the laptops and network

The only change I had to make to the laptops, was to enable OS X's built-in SSH server (System Preferences: Sharing: Services: Remote Login) and configure them to allow passwordless connections (ssh key authenticated) from the Linux server to the root account.

My wireless access point is configured to always give each laptop the same IP address, and the machines all have each others' IPs and hostnames in /etc/hosts. Without predictable IP addresses, most of this stuff would be much harder.

Configuring Dirvish

I ended up configuring Dirvish to only back up the /Users (the OS X equivalent to /home) tree from the laptops. Backups of the OS and apps are handled differently (see below).

Here are my Dirvish configuration files:

There we have my first deviation from "standard" Dirvish operation; there are two configuration files for Annie's machine. They are identical, aside from the "client:" setting. The primary file points to the IP address assigned to Annie's laptop when it's plugged into the ethernet, the secondary points to the address it has on the wifi. Choosing which file to use is done by the daily script discussed below.

I also created users on the Linux machine matching the usernames used on our macs. Dirvish is then configured with "numeric-ids: 0", which results in the backup files having the same owners as they did on the macs. This maintains privacy when re-exporting the backups.

Dirvish is invoked once a day by cron. Instead of calling Dirvish directly, cron starts a script I wrote: Dirvish-daily. This script uses ping to check whether the laptops are actually turned on and connected to the network. If they aren't, it waits and retries until they eventually show up and the backup can proceed. The script will also retry if the backup fails; I expect it to fail quite often, as we tend to turn our laptops off and on a lot.

(The way Dirvish-daily retries is the key to making this whole system work: we turn our laptops on and off many times a day, at relatively unpredictable times. I tried using the OS X power management scheduling to wake the machines up at night for backing up, but that doesn't seem to work if the lid is closed. So the system has to just keep trying and perform backups whenever the opportunity presents itself.)

Configuring Samba

I didn't actually have to reconfigure Samba much. I simply activated the "home directory" section in /etc/samba/smb.conf, restarted, and created users with the "smbpasswd -a" command.

After each successful backup, the script Make-Userlinks runs and adds symlinks to my and Annie's home directories. These symlinks point to the Dirvish snapshots of our files. Samba by default will happily follow symbolic links, so this suffices to make the data accessible.

Exporting the music

This is currently the least polished part of the system; it's basically a shell script which uses "cp -l" to create hard-link copies of the backed up "iTunes Music" folders in the folder that mt-daapd scans for music.

This will become more interesting once I've come up with a decent strategy for detecting and removing duplicate albums...

Backing up the OS and applications

As mentioned above, I decided against using Dirvish to back up the operating system and applications - it just took too long to run. Also, there is little utility to having "live" access to those files, so in the interest of saving space I'd rather back them up to a tightly compressed archive.

So I lied at the top... the system isn't fully automated: backing up the OS and apps is still a manual procedure I do "when I feel like it."

But to make me more inclined to "feel like it" once in a while, I created a simple shell script to do the heavy lifting: OSX-sys-backup

So, that's it! That's my backup system. I hope this document inspires others do similar or even better things with backups. I hereby place the shell scripts I wrote for this and linked above in the Public Domain, feel free to use them as you please.

Comments are most welcome, either via. e-mail or on the entry about this in my blog.