|JimmyG | Blog | Book | Life | Projects | Contact|
Posted: Sat 2nd Jun 2007, 8:06pm
The rsync man page describes rsync as a "faster, flexible replacement for rcp". Rsync is used for copying files from between remote hosts and local filesystems although it can also be used to copy files from locations on the same filesystem.
Here is a quick rsync test where we setup a directory root with two files test1 and test2 and rsync them to another directory backup with the command rsync -av root/ backup/. Here's the shell output:
james@bose:~$ mkdir root james@bose:~$ cd root/ james@bose:~/root$ echo "Test1" > test1 james@bose:~/root$ echo "Test2" > test2 james@bose:~/root$ ls -li total 8 33132 -rw-r--r-- 1 james james 6 2007-06-02 18:03 test1 33182 -rw-r--r-- 1 james james 6 2007-06-02 18:03 test2 james@bose:~/root$ cd .. james@bose:~$ mkdir backup james@bose:~$ rsync -a -v root/ backup/ building file list ... done ./ test1 test2 sent 209 bytes received 70 bytes 558.00 bytes/sec total size is 12 speedup is 0.04 james@bose:~$ ls -li backup/ total 8 33184 -rw-r--r-- 1 james james 6 2007-06-02 18:03 test1 33187 -rw-r--r-- 1 james james 6 2007-06-02 18:03 test2 james@bose:~$
The -a option copies files in archive mode and the -v option stands for verbose and outputs information while rsync is working. You can also use -vv and -vvv to get progressively more verbose messages.
This works nicely but it is much more interesting to copy files from a remote server. Using the -e option we can specify the remote shell to use, in this case ssh. You can also specify the remote shell by setting the RSYNC_RSH environment variable instead of specifying a value with -e.
rsync -a -v -e ssh email@example.com: web10
This copies user's home directory to a directory called web10 on the local machine. Bear in mind rsync will only be able to copy files which the user you specified has permissions to.
To specify a different remote directory you can put a path after the : character, for example:
rsync -a -v -e ssh firstname.lastname@example.org:/home/user web10
You might want to customise how the remote shell behaves, for example this might be a better option for the backups:
rsync -a -v -e "ssh -c arcfour -o Compression=no -x" email@example.com:/home/user web10
Here is what the options to -e mean:
ssh - use ssh instead of the default of rsh
-c arcfour - uses the weakest but fastest encryption that ssh supports
-o Compression=no - Turns off ssh's compression - rsync has its own if you want it which we'll discuss in a minute
-x - turns off ssh's X tunneling feature (if you actually have it on by default)
If bandwidth is a problem you might want to use the -z option to have rsync compress data it sends across the network. If you are using rsync compression it makes sense not to use ssh's compression in the way demonstrated above. Here's the command using rsync compression:
rsync -a -v -z -e "ssh -c arcfour -o Compression=no -x" firstname.lastname@example.org:/home/user web10
If you want to test these how effective each of these commands are you will need to delete the web10 directory rsync creates otherwise rsync will only copy files which have changed. Whilst that's normally what you want, it isn't too useful for tests.
Finally, since rsync is very efficient it can saturate a network connection. If you still want to be able to use your network connection whilst rsync is running you can use the --bwlimit option which allows you to specify a maximum transfer rate in kilobytes per second. Due to the nature of rsync transfers, blocks of data are sent, then if rsync determines the transfer was too fast, it will wait before sending the next data block. The result is an average transfer rate equaling the specified limit. For example to limit rsync to using 100KB/sec you could do this:
rsync -a -v -z --bwlimit=100 -e "ssh -c arcfour -o Compression=no -x" email@example.com:/home/user web10
You might also want to use --progress so that rsync prints out a %completion and transfer speed while transferring large files (but this isn't worth adding if you are running from a cron job). If you are performing a backup which you think you might want to restore at some point in the future you should use --numeric-ids. This tells rsync to not attempt to translate UID <> userid or GID <> groupid which is very important for avoiding permission problems when restoring. You might also want the -H option which forces rsync to maintain hardlinks on the server and the -x option which causes rsync to only copy files from one filesystem and not any other files which might be mounted as part of that directory structure. You can also use the --delete option which deletes files from the backup if they don't exist on the server. If you use --delete the files are deleted before the copying starts.
Putting this all together the command I use to backup one of my servers looks something like this:
rsync -aHxvz --delete --progress --numeric-ids -e "ssh -c arcfour -o Compression=no -x" firstname.lastname@example.org:/ pauli/
Bear in mind rsync is not much good at backing up databases such as MySQL because they frequently store information information in memory so although you may have a copy of the database files, when you restore them you might find the information they contain is corrupt.
For further reading about rsync have a look at the rsync man page or Kevin Korb's rsync article.