home | projects | about

Link-Backup
Copyright (c) 2004-2006 Scott Ludwig
http://www.scottlu.com

Download lb.py v0.8 | Download viewlb.cgi v0.1

Link-Backup is a backup utility that creates hard links between a series of backed-up trees, and intelligently handles renames, moves, and duplicate files without additional storage or transfer.

Transfer occurs over standard i/o locally or remotely between a client and server instance of this script. Remote backups rely on the secure remote shell program ssh.

Link-Backup comes with a web based viewer of the backups it makes. Thanks Joe Beda.

Usage:

lb [options] srcdir dstdir
lb [options] user@host:srcdir dstdir
lb [options] srcdir user@host:dstdir

Source or dest can be remote. Backups are dated with the following entries:

dstdir/YYYY.MM.DD-HH.MM:SS/tree/
dstdir/YYYY.MM.DD-HH.MM:SS/log

Options:

    --verify                Run rsync with --dry-run to cross-verify
    --numeric-ids           Keep uid/gid values instead of mapping; requires root
    --minutes <minutes>     Only run for <minutes>. Incremental backup.
    --showfiles             Don't backup, only list relative path files needing backup
    --catalogonly           Update catalog only
    --filelist <- or file>  Read srcdir relative path files to back up from file
    --lock                  Ensure only one backup to a given dest will run at a time
    --verbose               Show what is happening
    --ssh-i <file>	    Select id file to use for authentication (ssh -i)
    --ssh-C		    Use ssh compression (ssh -C)
    --ssh-p <port>	    ssh port on remote host (ssh -p)

Comments:

Link-Backup tracks unique file instances in a tree and creates a backup that while identical in structure, ensures that no file is duplicated unnecessarily. Files that are moved, renamed, or duplicated won't cause additional storage or transfer. dstdir/.catalog is a catalog of all unique file instances; backup trees hard-link to the catalog. If a backup tree would be identical to the previous backup tree, it won't be needlessly created.

How it works:

The src sends a file list to the dst. First dst updates the catalog by checking to see if it knows about each file. If not, the file is retrieved from the src and a new catalog entry is made:

    For each file:
    1. Check to see if the file path + file stat is present in the last tree.
    2. If not, ask for md5sum from the src. See if md5sum+stat is in the catalog.
    3. If not, see if md5sum only is in the catalog. If so copy catalog entry, rename
       with md5sum+new stat
    4. If not, request file from src, make new catalog entry.

Catalog files are named by md5sum+stats and stored in flat directories. Once complete, a tree is created that mirrors the src by hardlinking to the catalog.

Example 1:

python lb.py pictures pictures-backup

Makes a new backup of pictures in pictures-backup.

Example 2:

python lb.py pictures me@fluffy:~/pictures-backup

Backs up on remote machine fluffy instead of locally.

Example 3:

python lb.py --minutes 240 pictures me@remote:~/pictures-backup

Same as above except for 240 minutes only. This is useful if backing up over the internet only during specific times (at night for example). Does what it can in 240 minutes. If the catalog update completes, a tree is created hardlinked to the catalog.

4. python lb.py --showfiles pictures pictures-backup | python lb.py --filelist - pictures pictures-backup

Same as example #1.

Example 5:

1) python lb.py --showfiles pictures me@remote:~/pictures-backup | python lb.py --filelist - pictures me@laptop:~/pictures-transfer

2) python lb.py --catalogonly pictures-transfer me@remote:~/pictures-backup

3) python lb.py pictures me@remote:~/pictures-backup

If the difference between pictures and pictures-backup (for example) is too large for internet backup, the steps above can be used. Step 1 transfers only the differences to a laptop. Step 2 is at the location of machine "remote" and is initiated from the laptop to machine "remote". Step 3 is back at the source and will do a backup and notice all the files are present in the remote catalog, and will build the tree.

Note the source in step 2 could be more perfectly specified as the backup tree created underneath the pictures-transfer directory, although it is not necessary since only the catalog is being updated (however it would be a speedup).

History:

v0.8 12/24/2006 scottlu
  - Allow backup of any file while it is changing
  - Added --verbose logging to tree building
  - Minor --verify command fix

v 0.7 09/02/2006 scottlu
  - Ignore pipe, socket, and device file types
  - Added --ssh-i to select ssh id file to use (see ssh -i) (Damien Mascord)
  - Added --ssh-C to perform ssh compression (see ssh -C) (David Precious)
  - Added --ssh-p to specify remote port (see ssh -p) (David Precious)

v 0.6 06/17/2006 scottlu
  - Ignore broken symlinks and other failed stats during filelist creation
    (David Precious)
  - Added --lock, which ensures only one backup to a given dest can occur
    at a time (Joe Beda)

v 0.5 04/15/2006 scottlu
  - Added 'latest' link from Joe Beda http://eightypercent.net (thanks Joe!)
  - Fixed --verify. It wasn't specifying the remote machine (I rarely use
    verify but sometimes it is nice to sanity check backups)

v 0.4 11/14/2004 scottlu
  - Changed a central catalog design with backup trees hardlinking to the catalog.
    This way catalog updating can be incremental.
  - Removed filemaps - not required any longer
  - Changed logging to occur in the catalog as well as backups. Changed log parsing
    methods accordingly
  - Added incremental backup feature --minutes <minutes>
  - Make md5hash calculation incremental so a timeout doesn't waste time
  - Created 0.3-0.4.py for 0.3 to 0.4 upgrading
  - Added --showfiles, shows differences between src and dst
  - Added --catalogonly, updates catalog only, doesn't create tree
  - Added --filelist, specifies file list to use instead of tree
  - Removed --rmempty
  - Added --verbose

v 0.3 9/10/2004 scottlu
  - Added backup stat query methods
  - Changed log file format
  - Added viewlb.cgi, a web interface for viewing backups
  - added gzip compression of filemap
  - added --numeric-ids

v 0.2 8/28/2004 scottlu
  - filemap format change
  - added --rmempty
  - added --verify to run rsync in verify mode
  - added uid/gid mapping by default unless --numeric-ids is specified

v 0.1 8/19/2004 scottlu
  - Fully working backup, hardlinking between trees

License:

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
last modified December 24, 2006, at 09:36 AM