GMail backup with offlineimap and Git
Tagged:  •    •    •    •  

A couple of days ago I came across this excellent tip to download your GMail mail to your computer with IMAP. Before you continue reading here, I assume you read that article first. In this article, the path /path/to/gmail/backup is the path of the local repository, which you configured for offlineimap.

In the comments of that article, I stated that there's a problem with duplicate files. GMail has no real folders, but you can label e-mails. This implies that the same e-mail may possibly end up in more than one folder. When you download all folders, it is likely you store that e-mail multiple times.

One solution could be to disable all undesired folders in the IMAP configuration section at GMail (only make "All Messages" visible to IMAP). The advantage is that you will retrieve every e-mail once, but the downside is that you'll lose the complete folder structure.
The solution I stated in the comments was to make hardlinks in the local backup folder. You will retrieve e-mails more than once when they appear in more than one folder, but no duplication takes place on disk. Install a tool like dupmerge and execute the following command:

find /path/to/gmail/backup -type f -print0 | dupmerge

Still not really satisfactory, because when I backup this folder, the hardlinks are gone and duplication still takes place on my backup medium. So that's the point where I introduce using Git: it automatically takes care of duplicated content and compresses everything. These are the steps to follow:

cd /path/to/gmail/backup
git init
git config core.compression 9
git add *
git commit -m "Initial commit."

The following step is not strictly required, because up to now it is sufficient to only backup /path/to/gmail/backup/.git, it contains everything. But I decided to create a bare Git repository somewhere else which I'll actually backup:

git clone --bare --local /path/to/gmail/backup /path/to/gmailbare.git
cd /path/to/gmailbare.git
git config core.compression 9
git remote add origin /path/to/gmail/backup

With a bare repository, only the compressed and deltified contents are backed up, not the actual uncompressed checkout. This should save quite some network traffic and disk space on your backup medium.

Now the full backup script looks like the following snippet:

#!/bin/bash

GMAILBACKUP=/path/to/gmail/backup
GITBACKUP=/path/to/gmailbare.git

# Sync
/usr/bin/offlineimap -u Noninteractive.Quiet

# Commit to git
cd "$GMAILBACKUP"
git ls-files --deleted -z | xargs -0 git rm --cached
git ls-files --modified --others -z | xargs -0 git update-index --add
git commit -m "Backup $(date +%Y%m%d%H%M)"

cd "$GITBACKUP"
git fetch origin

You can run this script from a cronjob, which will silently update the repository for you.

Be careful though, if you decide to use another solution than the one stated above. I got myself in serious trouble by using Git to track the contents, because Git may remove empty directories if you are not careful with the rm subcommand. Always use the --cached flag, such that it won't touch your checkout.
For instance, when you omit this notion, you may remove a folder called new. This violates the maildir structure, and on the next run of offlineimap, it will start complaining:

maildir: makefolder called with arg 'INBOX/'
maildir: Is dir? False

This raises an unhandled exception and offlinebackup will stop immediately. So be very careful with maintaining the maildir property!

So this article shows how you can backup all your e-mail on GMail to your computer. By using Git to track these contents, you solve the duplication problem and it's possible to efficiently backup your email (with respect to disk space and network). By running the provided script on a regular basis, you still have your emails in the event of an escaped monster eating all data at Google.

Feedback on this backup solution is appreciated, drop me a note or leave a comment if you like.

Thanks!!!

Thanks for giving out information about GMail backup with offlineimap and Git. Your way of describing is very good and easy to use.