Incremental mysql backup using mysqldump and svn

The motivation

In my opinion, the backup of a database should be:

  • "Delta", meaning that you only copy what has changed, not the whole database. The point is to have a quick transfer between the database machine and the backup machine, and to keep the volume of storage under control (given that a backup is not only a copy of your data, but also of version N-1, N-2, etc...)
  • Text file if possible, as opposed to some binary format. The idea is that, if you have to restore your database on a different architecture, or you have lost your system and you must restore to a different database software, or you must restore them in a distant future and you might not have the database working, you have a better chance to be able to interpret and restore your data if it is "clear" text.

Two more features are desirable:

  • The system must be kept simple, the more complex it is, the less likely it is to work.
  • Having versioning is a nice addition.

The principle

The principle here is to use mysqldump to get the content of the database as text files, and then to create the "delta" with the previous version.

You can dump a mysql database using mysqldump. If you don't do anything particular, you get a text file, however it may be difficult to do a delta between two such dumps.

Yet, by using mysqldump --tab, mysqldump can dump a database with one file (actually two) per table, and one record per line inside each file. This way it's easier to compute the delta between two successive dumps.

This could be done using an utility like diff, but using a versioning system like svn will do the bookkeeping for you, providing the versioning for free.

The choice of the versioning system

svn is suitable here, because it works by keeping a diff (a delta) between every version. git is not suitable for that matter, because it keeps a full copy of every file. cvs can't handle binary files, thus the choice of svn.

A property of mysqldump

The proper working of the system depends on the way mysqldump works. First, it dumps the records of a table in the order in which they are in the storage, which is more or less the order in which the data have been added to the table. Thus, the records always appear in the same order provided there were no modification. Moreover some operations, such as the alteration of a numerical field inside a record does not change the dump order. This is an important property, because if the records were appearing, say, in a random order changing every time, the delta could be much bigger than the vanilla dump of the data, thus defeating the whole mechanism. This property isn't guaranteed by mysql, so it may break in the future, on the other hand, it has worked since the beginning of mysqldump.

The recipe

(This is a bit unix tainted, but should work on other operating systems as well).

Creating the environment

First, we create a directory where everything will reside. We do all this with the user that will perform the backup. We call it for example backup.

mkdir backup
cd backup

We then create the repository (you must have svn installed).

svnadmin create repository

We then create the svn controlled directory, we called it current:

svn checkout file://`pwd`/repository current

Lastly, we create a directory to perform the dump. The reason for this additional directory is that, when using mysqldump --tab, the directory must be writable by the mysqld process, which is preferably not the same as the user used for the backup for security reason: several things may need a backup, and they should not share the same permissions. By creating a dedicated user, we mitigate this risk.

mkdir dumpdir
sudo chgrp mysql dumpdir
chmod 770 dumpdir

The mysql in the 2nd line is the (operating system) user running the mysql daemon.

Adding permission to the database

Let's say the mysql user for accessing the database is 'mysqluser'@'localhost'. You must grant file permission to be able to use mysqldump --tab.

mysql -u database-master-user -p -e "grant file on *.* to 'mysqluser'@'localhost';"

First run

mysqldump --tab dumpdir --user=mysqluser -p database_name
(type the password)

If all went well, you have in dumpdir two files for each table: a .sql with the table structure, and a .txt file with the data. Each record is on separate line (by default), but there may be additional newline characters if there was a newline inside a record (with a \ at the end of the preceding line to escape it).

We then enter them in svn:

mv dumpdir/* current
cd current
svn add *
svn commit -m "Initial import" .
cd ..

The script

I put it in a bin directory:

mkdir bin
cat >bin/run
#!/bin/sh

here=`dirname $0`
pwd=`cd $here/../..; pwd`

passwd=database-password
dir=$pwd/backup
rm $dir/current/* # Cleanup dumpdir, note that .svn will still exist
mysqldump --skip-dump-date --tab $dir/dumpdir --user=mysqluser --password=$passwd database_name
mv $dir/dumpdir/* $dir/current
(cd $dir/current && svn commit --quiet -m `date +%Y%m%d` .; svn status)

A few explanations

mysqldump

I use mysqldump with --skip-dump-date. This is to avoid having a modification in the dump file if the table hasn't changed. On the other hand, one may adopt the opposite view and keep the date. This results in a slighty bigger delta, but on the other hand each version of the table dump is tagged with the corresponding date from the database server point of view.

svn status

If some table are created or destroyed, you must take special care to add / delete them in the repository. Here the script does nothing special, but by performing a svn status at least you get a message telling you if some tables have been added or deleted.

Done

Et voila, you can put backup/bin/run in a crontab. Every time you copy the repository to the backup machine (with rsync for example), only the delta will be transferred.

Considerations for various applications

mediawiki

mediawiki puts the page access count in the same record as the page data. Thus a static site with accesses will have a lot of modifications. You can mitigate this by setting $wgHitcounterUpdateFreq to some value in mediawiki.

Most likely you don't need to backup the cache tables.

Comments

<comments/>