Tux

...making Linux just a little more fun!

Backup software/strategies

Ben Okopnik [ben at linuxgazette.net]


Wed, 11 Jul 2007 09:24:40 -0400

All of us know - at least I hope we do - that we should all be doing regular backups; that's a given. Chances are, though, that we've all skipped one or two (or more) on the schedule ("heck, nothin's happened so far; it shouldn't make too much of a difference...") - that is, if you even have a schedule, rather than relying on some vague sense of "it's probably about time..." I have to admit that, as much as I advise my customers to have a solid backup plan, I'm less than stellar about having a polished, perfect plan myself.

In part, this is because backups are much easier with a desktop than a laptop - or even with a desktop to which you synch the laptop once in a while. Operating purely from a laptop, as I do, means that I don't have an always-connected tape drive (or whatever) - and doing a backup is always a hassle, involving digging out the external HD, hooking it up, and synchronizing. Moreover, since I do a lot of travelling, setting an alarm doesn't seem to be very useful; it usually goes off while I'm on the road, at which point I can only glare at it in frustration.

As with so many things, what I really need is a copy of "at" installed in my brain... but lacking that, well, I decided to dump the problem on you folks. :)

Can anyone here think of a sensible backup plan for the situation that I've described - laptop, external backup, arbitrary schedule - and some way to set up a schedule that work with that? Also, does anyone have a favorite WRT backup software? I really miss the ability to do incremental backups; that would be awf'lly nice (I don't mind carrying a few DVDs with me, and using the external HD for a monthly full backup.)

Good ideas in this regard - whether they competely answer the question or not - are highly welcome.

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top    Back


René Pfeiffer [lynx at luchs.at]


Wed, 11 Jul 2007 17:44:15 +0200

On Jul 11, 2007 at 0924 -0400, Ben Okopnik appeared and said:

> All of us know - at least I hope we do - that we should all be doing
> regular backups; that's a given. Chances are, though, that we've all
> skipped one or two (or more) on the schedule [...]

Most people have cronjobs that automatically skip backups, so they don't need to worry. ;)

I've given backup strategies a lot of thinking lately and I am still not happy with some solutions and tools.

> [...] Can anyone here think of a sensible backup plan for the
> situation that I've described - laptop, external backup, arbitrary
> schedule - and some way to set up a schedule that work with that?

You could use a combination of rsync, OpenSSH and Perl in order to set up a rsyncd on your laptop, have a backup machine look for your laptop on the network and whenever the server sees it create an SSH tunnel and grab the latest deltas. http://backuppc.sourceforge.net/ has something like this according to its feature list:

"Supports mobile environments where laptops are only intermittently
connected to the network and have dynamic IP addresses (DHCP)."
BackupPC uses hard links for identical files thus saving a lot of disk space when backuping multiple servers and clients.

> Also, does anyone have a favorite WRT backup software? I really miss
> the ability to do incremental backups; that would be awf'lly nice (I
> don't mind carrying a few DVDs with me, and using the external HD for
> a monthly full backup.)

For me the usual suspects are rsync and rdiff-backup when it comes to moderate amounts of storage. Indexing would be nice and in theory every filesystem should offer something like that but most don't. I think I'll explain what I mean in a seperate posting to TAG.

I have yet to walk through the complete list at http://linuxmafia.com/pub/linux/backup/00index.txt since I do incremental backups by parsing the rsync logs of the backup server or looking for timestamps. Mostly I stick to mirrored repositories because I don't want to go through several incrementals when restoring something. Most of the time I need to restore everything and don't need incrementals. :)

Best, René.


Top    Back


Ben Okopnik [ben at linuxgazette.net]


Wed, 11 Jul 2007 12:44:25 -0400

On Wed, Jul 11, 2007 at 05:44:15PM +0200, René Pfeiffer wrote:

> On Jul 11, 2007 at 0924 -0400, Ben Okopnik appeared and said:
> > All of us know - at least I hope we do - that we should all be doing
> > regular backups; that's a given. Chances are, though, that we've all
> > skipped one or two (or more) on the schedule [...]
> 
> Most people have cronjobs that automatically skip backups, so they don't
> need to worry. ;)
> 
> I've given backup strategies a lot of thinking lately and I am still not
> happy with some solutions and tools.

[nod] That's where I am as well. I've been doing this iterative "pick a backup strategy, wrestle with it for a while, give up in frustration, let time pass until frustration level decreases to manageable level, repeat" thing for years now, and haven't come up with anything truly solid or positive - so it's been a "whenever memory and capability coincide" method by default -and I'm extremely dissatisfied with that.

> > [...] Can anyone here think of a sensible backup plan for the
> > situation that I've described - laptop, external backup, arbitrary
> > schedule - and some way to set up a schedule that work with that?
> 
> You could use a combination of rsync, OpenSSH and Perl in order to set
> up a rsyncd on your laptop, have a backup machine look for your laptop
> on the network and whenever the server sees it create an SSH tunnel and
> grab the latest deltas. http://backuppc.sourceforge.net/ has something
> like this according to its feature list:
> 
> "Supports mobile environments where laptops are only intermittently
> connected to the network and have dynamic IP addresses (DHCP)."

The problem with that is that I don't have a "home base" - it's just not feasible to have an always-on connection on the boat - and using space on someone else's machine isn't really feasible. At that point, my ideas run off in the direction of buying rack space, and all the hassle that entails - especially since St. Augustine, for all its charm, is the technological backwoods where the hoot owls trod the chickens.

As a result, it never comes to pass. And I'm still searching for a workable strategy. :/

> Mostly I stick to mirrored repositories because
> I don't want to go through several incrementals when restoring
> something. Most of the time I need to restore everything and don't need
> incrementals. :)

In my case, that's not the usual situation; in fact, restoring a complete backup would only be necessary in case of catastrophic failure or a new laptop. 99% of the time, I'd want to restore a specific file or two.

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top    Back


Karl-Heinz Herrmann [khh at khherrmann.de]


Wed, 11 Jul 2007 21:17:34 +0200

Hi Ben,

On Wed, 11 Jul 2007 09:24:40 -0400 Ben Okopnik <[email protected]> wrote:

> In part, this is because backups are much easier with a desktop than a
> laptop - or even with a desktop to which you synch the laptop once in a
> while. Operating purely from a laptop, as I do, means that I don't have
> an always-connected tape drive (or whatever) - and doing a backup is
> always a hassle, involving digging out the external HD, hooking it up,
> and synchronizing. Moreover, since I do a lot of travelling, setting an
> alarm doesn't seem to be very useful; it usually goes off while I'm on
> the road, at which point I can only glare at it in frustration.

Hmm... keeping the schedule might be the tougher part in this.

> Can anyone here think of a sensible backup plan for the situation that
> I've described - laptop, external backup, arbitrary schedule - and some
> way to set up a schedule that work with that? Also, does anyone have a
> favorite WRT backup software? I really miss the ability to do incremental

I use backuppc at work and since a while now at home. This is a suite of perl scripts using rsync/tar/... as file transfer backends, ssh tunnels included. Web frontend allows to trigger backups, get restore files, browse the archives. The concept of incremental/full backups is also builtin. Usually it would run backups when scheduled -- but with a sinlge drive laptop and external drive a possible way to do this would be: Leave the laptop on during night and let it do its backup (would require connecting the drive, /etc/backuppc/start and a few clicks in the web-frontend -- or just waiting for the automatic backup kicking in) 7 incremental, once a week a full one. Drive space permitting backuppc allows you to keep several full backups as archive. removed files will be recoverable.

disadvantage: After copying the files backuppc will spend lots of time figuring out wihch are identical and hardlinking (optionally compressing) them. This can take longer than the original transfer :-/

It could handle a laptop only occasionally connected to a backup PC somewhere. In that case the backuppc "server2 would run on the PC, just fetching the files from the laptop -- then the laptop could already be diconnected again. incremental backups of /home/user could be pretty quick that way -- but you would need that running server.

> backups; that would be awf'lly nice (I don't mind carrying a few DVDs
> with me, and using the external HD for a monthly full backup.)

backuppc would need a drive. As far as I know DVD backups (especially multi-DVD's) are not possible. All the history features and hardlinking needs not just a drive but some decent file system as well.

Maybe not quite the solution you were looking for -- but maybe it starts you thinking? I like the features of backuppc for local networks -- it is perfect for nightly daily backups. But it does handle floating laptops, including a reminder mail every week (or whatever you set it to).

But then -- the last mail I got was: Your PC (khhlap) has not been successfully backed up for 155.0 days. Your PC has been correctly backed up 2 times from 155.1 to 155.0 [...]

Not that anything much happened to that lap during that time .-)

(My much used 24/7 PC at home is backing itself up to an external drive daily.... )

K.-H.


Top    Back


Kapil Hari Paranjape [kapil at imsc.res.in]


Thu, 12 Jul 2007 09:17:28 +0530

Hello,

On Wed, 11 Jul 2007, Ben Okopnik wrote:

> All of us know - at least I hope we do - that we should all be doing
> regular backups; that's a given.

Like one of the "existence" proofs in mathematics --- this gives no hint of how one arrives at a solution :-)

> Can anyone here think of a sensible backup plan for the situation that
> I've described - laptop, external backup, arbitrary schedule - and some
> way to set up a schedule that work with that?

There are (at least) two different reasons why one (who uses a laptop or desktop) might create a backup:

	1. Protection against catastrophic break down or lack of access to the laptop's disk.
	2. Protection against accidental deletion of important work files.
As far as (1) is concerned, I have already written out a kind of strategy in LG #140. To that one can add the fact the laptops are not "always on" so that "cron" type scheduling is irrelevant. So one approach is to periodically schedule "laptop maintenance mornings":

	a. Do a backup.
	b. Run all those pending cron jobs with "anacron -s".
	c. Send all those pending "popularity-contest" and bug report mailings, run "cruft" and do house-keeping.
	d. Possibly clean the real cruft from the screen and keyboard.
For (a), I would use a simple script that invokes LVM snapshots and rsync. I would not bother with incremental backups for (1).

One solution for (2) is to use version control for everything important. With a typical system like "git" this means all that is backed-up uses about double (or more[*]) space than it would otherwise. However, it also means that recovery of deleted files is quick and easy.

Since (1) takes care of backing up the repository as well, you have the necessary incremental backup too.

Having said this, I have only partially implemented (2) and (1) is often postponed. Note that (1) tends to run into lunch and the late evening as well :-)

Obviously, the strategy for a network of computers that share files and are always on would be quite different.

Regards,

Kapil. [*] When is an old version to be dumped? Look at Rene's posting for some thoughts on this. --


Top    Back


Ben Okopnik [ben at linuxgazette.net]


Thu, 12 Jul 2007 12:57:06 -0400

On Wed, Jul 11, 2007 at 09:17:34PM +0200, Karl-Heinz Herrmann wrote:

> Hi Ben,
> 
> 
> On Wed, 11 Jul 2007 09:24:40 -0400
> Ben Okopnik <[email protected]> wrote:
> 
> > In part, this is because backups are much easier with a desktop than a
> > laptop - or even with a desktop to which you synch the laptop once in a
> > while. Operating purely from a laptop, as I do, means that I don't have
> > an always-connected tape drive (or whatever) - and doing a backup is
> > always a hassle, involving digging out the external HD, hooking it up,
> > and synchronizing. Moreover, since I do a lot of travelling, setting an
> > alarm doesn't seem to be very useful; it usually goes off while I'm on
> > the road, at which point I can only glare at it in frustration.
> 
> Hmm... keeping the schedule might be the tougher part in this. 

In theory, that would be the job of something like "at"; however, that's triggered off by a) time markers and b) booting the computer (at least as far as I understand it.) The question is, how do I get an alarm to go off when a) a backup is due or overdue and b) I'm where I can actually do something about it?

(Perl's Telepathy module is still in development, I'm afraid. The docs actually say so. :)

> > Can anyone here think of a sensible backup plan for the situation that
> > I've described - laptop, external backup, arbitrary schedule - and some
> > way to set up a schedule that work with that? Also, does anyone have a
> > favorite WRT backup software? I really miss the ability to do incremental
> 
> I use backuppc at work and since a while now at home. This is a suite
> of perl scripts using rsync/tar/... as file transfer backends, ssh
> tunnels included. Web frontend allows to trigger backups, get restore
> files, browse the archives. The concept of incremental/full backups is
> also builtin. 

I've installed it; it's a very nifty looking piece of software, and very much the kind of thing that I was thinking about. That's probably what I'll end up using, although I'll be triggering it manually. Thanks!

> Usually it would run backups when scheduled -- but with a
> sinlge drive laptop and external drive a possible way to do this would
> be: Leave the laptop on during night and let it do its backup (would
> require connecting the drive, /etc/backuppc/start and a few clicks in
> the web-frontend -- or just waiting for the automatic backup kicking
> in) 7 incremental, once a week a full one. Drive space permitting
> backuppc allows you to keep several full backups as archive. removed
> files will be recoverable. 

The problem there would be that I can't really leave the 'top on every night. I have to make every erg of power on board - whether via gasoline-powered generator, solar panels, or wind generator (I have all three) - and leaving a big power drain like a laptop on all night is just not doable.

I neglected to mention that because... well, fish aren't aware of water. It's just there. For me, that's how budgeting power works.

> backuppc would need a drive. As far as I know DVD backups (especially
> multi-DVD's) are not possible. All the history features and
> hardlinking needs not just a drive but some decent file system as well.

I could probably fake it out by creating a virtual FS on the laptop, doing a 'backup' to that, and copying the VFS to the DVD when done. In fact, that's probably a really good technique for when I'm on the road; heck, I probably wouldn't even need a DVD (a single CDROM would do.)

> Maybe not quite the solution you were looking for -- but maybe it
> starts you thinking? 

It does - and that's exactly what I was looking for, since my own ideas had mostly hit dead ends and blank walls. Thanks, Karl-Heinz!

> I like the features of backuppc for local networks
> -- it is perfect for nightly daily backups. But it does handle floating
> laptops, including a reminder mail every week (or whatever you set it
> to).
And that would be a key feature of whatever scheduling scheme I came up with. In a lot of ways, I use my inbox to remind myself of pending and ongoing tasks.

> (My much used 24/7 PC at home is backing itself up to an external drive
> daily.... )

There are times I envy people with desktops - but the ratio of what I gain by having a laptop to what I'd gain by having a desktop is very strongly tilted toward the latter. Which doesn't stop me from bitching about what I'm missing, of course - or trying to find ways of getting it anyway.

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top    Back


Ben Okopnik [ben at linuxgazette.net]


Fri, 13 Jul 2007 14:20:41 -0400

On Thu, Jul 12, 2007 at 09:17:28AM +0530, Kapil Hari Paranjape wrote:

> Hello,
> 
> On Wed, 11 Jul 2007, Ben Okopnik wrote:
> > All of us know - at least I hope we do - that we should all be doing
> > regular backups; that's a given.
> 
> Like one of the "existence" proofs in mathematics --- this gives no
> hint of how one arrives at a solution :-)

Heh. That's why mathematics professors' favorite phrase tends to be "...the solution is trivial, and therefore left to the student."

> > Can anyone here think of a sensible backup plan for the situation that
> > I've described - laptop, external backup, arbitrary schedule - and some
> > way to set up a schedule that work with that?
> 
> There are (at least) two different reasons why one (who uses a
> laptop or desktop) might create a backup:
> 
> 	1. Protection against catastrophic break down or lack of
> 	access to the laptop's disk.

That carries about 75% of the weight in my situation...

> 	2. Protection against accidental deletion of important work
> 	files.

...and this is the rest of it.

> As far as (1) is concerned, I have already written out a kind of
> strategy in LG #140. 

What, the 'back it up if you haven't already" part? I've got that one handled.

> To that one can add the fact the laptops are not
> "always on" so that "cron" type scheduling is irrelevant. 

This, of course, is why I mentioned "at" - which will either run at the specified time, or keep retrying if it misses the original schedule.

> So one
> approach is to periodically schedule "laptop maintenance mornings":

Which brings us back to the original problem. "Every other Tuesday where the date is not divisible by 3 or 7 and is not within 4 days of the end of month" can be scheduled; "7 days after last backup assuming that I'm at home, otherwise as soon as I get home and have had a chance to unpack and catch up on sleep" cannot.

> 	a. Do a backup.
> 	b. Run all those pending cron jobs with "anacron -s".
> 	c. Send all those pending "popularity-contest" and bug report
> 	mailings, run "cruft" and do house-keeping.
> 	d. Possibly clean the real cruft from the screen and
> 	keyboard.
> 
> For (a), I would use a simple script that invokes LVM snapshots and
> rsync. I would not bother with incremental backups for (1).
> 
> One solution for (2) is to use version control for everything
> important. With a typical system like "git" this means all that
> is backed-up uses about double (or more[*]) space than it would
> otherwise. However, it also means that recovery of deleted files is
> quick and easy.
> 
> Since (1) takes care of backing up the repository as well, you have
> the necessary incremental backup too.
> 
> Having said this, I have only partially implemented (2) and (1) is
> often postponed. Note that (1) tends to run into lunch and the late
> evening as well :-)

[laugh] Which creates another scheduling problem if you can't do it all in one swell foop. I'm not worried about that one at all.

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top    Back


René Pfeiffer [lynx at luchs.at]


Sat, 14 Jul 2007 13:18:58 +0200

On Jul 13, 2007 at 1420 -0400, Ben Okopnik appeared and said:

> On Thu, Jul 12, 2007 at 09:17:28AM +0530, Kapil Hari Paranjape wrote:
> [...]
> > To that one can add the fact the laptops are not
> > "always on" so that "cron" type scheduling is irrelevant.
> This, of course, is why I mentioned "at" - which will either run at the
> specified time, or keep retrying if it misses the original schedule.

Shouldn't anacron take care of this? AFAIK anacron checks for missed cron job since the last boot and reschedules them. Provided that the cron jobs check for the presence of a suitable backup device/server (and don't offer all data to the frist device in range with the right IP address) this could work, don't you think?

> > So one approach is to periodically schedule "laptop maintenance
> > mornings":
> Which brings us back to the original problem. "Every other Tuesday
> where the date is not divisible by 3 or 7 and is not within 4 days of
> the end of month" can be scheduled; "7 days after last backup assuming
> that I'm at home, otherwise as soon as I get home and have had a
> chance to unpack and catch up on sleep" cannot.

Sounds as if your cron jobs need some serious plugins. ;)

> > 	a. Do a backup.
> > 	b. Run all those pending cron jobs with "anacron -s".

Ah, I missed anacron at the first reading of Kapil's posting.

> > Having said this, I have only partially implemented (2) and (1) is
> > often postponed. Note that (1) tends to run into lunch and the late
> > evening as well :-)
> [laugh] Which creates another scheduling problem if you can't do it all
> in one swell foop. I'm not worried about that one at all.

About how much data per time period between backups are we talking? I just want to get a rough estimate.

Best wishes, René.


Top    Back


Ben Okopnik [ben at linuxgazette.net]


Sat, 14 Jul 2007 13:36:07 -0400

On Sat, Jul 14, 2007 at 01:18:58PM +0200, René Pfeiffer wrote:

> On Jul 13, 2007 at 1420 -0400, Ben Okopnik appeared and said:
> > On Thu, Jul 12, 2007 at 09:17:28AM +0530, Kapil Hari Paranjape wrote:
> > [...]
> > > To that one can add the fact the laptops are not
> > > "always on" so that "cron" type scheduling is irrelevant. 
> > 
> > This, of course, is why I mentioned "at" - which will either run at the
> > specified time, or keep retrying if it misses the original schedule.
> 
> Shouldn't anacron take care of this? AFAIK anacron checks for missed
> cron job since the last boot and reschedules them. Provided that the
> cron jobs check for the presence of a suitable backup device/server (and
> don't offer all data to the frist device in range with the right IP
> address) this could work, don't you think?

Ah. I was thinking of "anacron" when I said "at". Right.

> > > So one approach is to periodically schedule "laptop maintenance
> > > mornings":
> > 
> > Which brings us back to the original problem. "Every other Tuesday
> > where the date is not divisible by 3 or 7 and is not within 4 days of
> > the end of month" can be scheduled; "7 days after last backup assuming
> > that I'm at home, otherwise as soon as I get home and have had a
> > chance to unpack and catch up on sleep" cannot.
> 
> Sounds as if your cron jobs need some serious plugins. ;)

That Telepathy module would come in handy, yes. :)

> > > 	a. Do a backup.
> > > 	b. Run all those pending cron jobs with "anacron -s".
> 
> Ah, I missed anacron at the first reading of Kapil's posting.
> 
> > > Having said this, I have only partially implemented (2) and (1) is
> > > often postponed. Note that (1) tends to run into lunch and the late
> > > evening as well :-)
> > 
> > [laugh] Which creates another scheduling problem if you can't do it all
> > in one swell foop. I'm not worried about that one at all.
> 
> About how much data per time period between backups are we talking? I
> just want to get a rough estimate.

Relatively minor, I'd say. Assuming that I want to back up at least once per week, here are the factors that go into it:

1) New packages: I'm not at a point yet where I have all the packages I want installed, but I'm certainly past the half-way point; I'm probably installing one or two a week now, maximum. In any case, even if I lose those, it's just not a huge factor - I can always reinstall. Worst-case scenario: maybe 20MB worth of changes.

2) User data: This is the important stuff, of course. LG-related work in /var, changes in /home and /usr/local/... that's pretty much it. I seriously doubt that this would exceed 10MB in any given week.

So, really, we're looking at a max of 10MB that's absolutely critical, with another 20MB that would be nice to have but no major loss if it doesn't happen.

Hmm. Having said that, it's clear that I can do incrementals using a flash drive, and schedule the full backups when I'm home (say, an alarm every Monday, ignoring those that happen on the road.) Now the question becomes, what do I use? "dump" is... crude. "backuppc" requires a hard drive (well, maybe. I'll have to play with it and find out.) Any other suggestions?

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top    Back


Rick Moen [rick at linuxmafia.com]


Sat, 14 Jul 2007 11:28:27 -0700

Quoting Ben Okopnik ([email protected]):

> 1) New packages: I'm not at a point yet where I have all the packages I
> want installed, but I'm certainly past the half-way point; I'm probably
> installing one or two a week now, maximum. In any case, even if I lose
> those, it's just not a huge factor - I can always reinstall. Worst-case
> scenario: maybe 20MB worth of changes.

I would urge not including (not-locally-modified) upstream packages in one's backups -- because they are available at need directly from the distro's Internet package mirrors. Instead, just back up a list of installed package that can later be used during a semi-automated restore / rebuild. E.g., on a Debian or similar system, include the following command's output in your backup sets:

$ sudo dpkg --get-selections "*" > /root/selections-$(date +%F)
Also include maps of your partition tables:

$ sudo /sbin/fdisk -l /dev/sda > /root/partitions-sda-$(date +%F)
$ sudo /sbin/fdisk -l /dev/sdb > /root/partitions-sdb-$(date +%F)
[etc.]
If there's a need for quick restore / rebuild, install a minimal system from installation media, then do:

# dpkg --set-selections < selections-[date string]
# apt-get dselect-upgrade
The above is taken from my own recipe for casual backups: http://linuxmafia.com/faq/Admin/linuxmafia.com-backup.html

I realise that you might be saying "Wait, I don't want to have to re-fetch all of my packages from Internet package mirrors over my very slow Net connection." OK, but having a full CD set of your preferred distro handy for reinstallations is still a better solution than including unaltered upstream package contents in your system backups.


Top    Back


Ben Okopnik [ben at linuxgazette.net]


Sat, 14 Jul 2007 15:31:20 -0400

On Sat, Jul 14, 2007 at 11:28:27AM -0700, Rick Moen wrote:

> Quoting Ben Okopnik ([email protected]):
> 
> > 1) New packages: I'm not at a point yet where I have all the packages I
> > want installed, but I'm certainly past the half-way point; I'm probably
> > installing one or two a week now, maximum. In any case, even if I lose
> > those, it's just not a huge factor - I can always reinstall. Worst-case
> > scenario: maybe 20MB worth of changes.
> 
> I would urge not including (not-locally-modified) upstream packages in
> one's backups -- because they are available at need directly from the
> distro's Internet package mirrors.

[ snip ]

> I realise that you might be saying "Wait, I don't want to have to
> re-fetch all of my packages from Internet package mirrors over my very
> slow Net connection."  OK, but having a full CD set of your preferred
> distro handy for reinstallations is still a better solution than
> including unaltered upstream package contents in your system backups.

I failed to make myself clear, obviously. I'm not interested in having backups of the packages themselves; what I meant was that I'm installing a couple of packages a week on my system, and a full backup will inevitably include their contents. Assuming that my most recent backup is within a couple of years (!) of the current "state of the state", a full restore plus an "apt-get update && apt-get upgrade" should bring things up to spec.

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top    Back


Rick Moen [rick at linuxmafia.com]


Sat, 14 Jul 2007 13:50:24 -0700

Quoting Ben Okopnik ([email protected]):

> what I meant was that I'm installing a couple of packages a week on
> my system, and a full backup will inevitably include their contents.

I don't mean to be difficult, but that's not inevitable -- nor, I would maintain, desirable. It's my view that one should carefully avoid backing up any packages that can be provided easily from distro-packaged archives, in the event of rebuild / restore. Therefore, one should exclude /bin, /sbin, /lib, most of /usr (except /usr/local, and other odd exceptions that one may find such as /usr/lib/cgi-bin that may have locally installed files.

If you make sure every backup set has a catalougue of currently installed package names, then that plus locally generated files are literaly all that need be backed up. Here is a complete list of directories that do need backing up, on my server:

/root                        Root user's home directory (includes above files)
/etc                         System configuration files
/usr/lib/cgi-bin             CGI scripts
/var/lib/mysql               MySQL database files (dump if not quiescent)
/boot/grub/menu.lst          GRUB bootloader configuration
/var/spool/exim4             Exim and SA-Exim internal files
/var/spool/news              NNTP news spool for Leafnode
/var/spool/mail              SMTP mail spool
/var/lib/mailman/lists       Mailing list definitions for Mailman
/var/lib/mailman/archives    Mailing list archives for Mailman
/usr/local                   Locally installed files and records
/var/www                     Public http, ftp, rsync tree
/home                        Non-root users' home trees
Making sure that list is complete for a given system requires some study of one's system. What I'm saying is that such study, as an alternative to just copying everything, is worth the trouble.

> Assuming that my most recent backup is within a couple of years (!) of
> the current "state of the state", a full restore plus an "apt-get
> update && apt-get upgrade" should bring things up to spec.

Yes, but so would my approach, with significantly smaller backup sets.


Top    Back


Ben Okopnik [ben at linuxgazette.net]


Sat, 14 Jul 2007 19:01:02 -0400

On Sat, Jul 14, 2007 at 01:50:24PM -0700, Rick Moen wrote:

> 
> If you make sure every backup set has a catalougue of currently
> installed package names, then that plus locally generated files are
> literaly all that need be backed up.  Here is a complete list of
> directories that do need backing up, on my server:
> 
> /root                        Root user's home directory (includes above files)
> /etc                         System configuration files
> /usr/lib/cgi-bin             CGI scripts
> /var/lib/mysql               MySQL database files (dump if not quiescent)
> /boot/grub/menu.lst          GRUB bootloader configuration
> /var/spool/exim4             Exim and SA-Exim internal files
> /var/spool/news              NNTP news spool for Leafnode
> /var/spool/mail              SMTP mail spool
> /var/lib/mailman/lists       Mailing list definitions for Mailman
> /var/lib/mailman/archives    Mailing list archives for Mailman
> /usr/local                   Locally installed files and records
> /var/www                     Public http, ftp, rsync tree
> /home                        Non-root users' home trees
> 
> Making sure that list is complete for a given system requires some study
> of one's system.  What I'm saying is that such study, as an alternative
> to just copying everything, is worth the trouble.

I see your point, and I agree that it can be a valuable approach - in fact, it's well worth considering in my own case. The problem is that a solution like this can't be generalized to the average user; most people have no idea of what on their system has been "custom-modified" vs. being tweaked by package installation, and chances are good that they've also forgotten anything unusual that they have tweaked. E.g., as I sit here racking my brain for anything that falls outside the parameters, I've just recalled my travails with "ndiswrapper" to get my WiFi working - which required putting the driver files into a directory somewhere in "/usr/lib" (or was it "/usr/share"?) There was also broken behavior in several X apps - I've reported it to Ubuntu already, but the problem dates back to the changeover from X11 to Xorg - that required symlinking "/usr/lib/X11/fonts/" to "/usr/share/X11/fonts/".

In other words: in my experience, it's not nearly as easy to separate "system" files from "user" files as it should be. Now that I'm thinking about it, I've tried doing something like this before (although admittedly not in as nearly organized a fashion as you suggest), and ended up with a whole lot of pain and suffering - i.e., reconstructing all the customization that I'd applied to my old system over the years.

In theory, I should be keeping a log of all the changes of that sort that I apply. In reality, some of the tweaks were done under time pressure and as emergency fixes, and aren't "retrievable" other than going through the same PITA that caused them to come into being in the first place.

> > Assuming that my most recent backup is within a couple of years (!) of
> > the current "state of the state", a full restore plus an "apt-get
> > update && apt-get upgrade" should bring things up to spec.
> 
> Yes, but so would my approach, with significantly smaller backup sets.

That's what makes it worth considering, of course. Decreasing the size of the "full" backup would be a very useful thing.

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top    Back


Ben Okopnik [ben at linuxgazette.net]


Sat, 14 Jul 2007 19:22:22 -0400

Oh, and just as a comparison: Rick's approach, for my system, results in a backup set that's just a hair under 25GB; a full backup is a bit over 53GB. Pretty significant.

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top    Back


Kapil Hari Paranjape [kapil at imsc.res.in]


Sun, 15 Jul 2007 09:27:36 +0530

Hello,

On Sat, 14 Jul 2007, Rick Moen wrote:

> If you make sure every backup set has a catalougue of currently
> installed package names, then that plus locally generated files are
> literaly all that need be backed up.  Here is a complete list of
> directories that do need backing up, on my server:

I would add the following to Rick's list:

	/var/lib/dpkg		Saves a lot of config info (alternatives, diverts, ...)
				which "dpkg --get-selections" misses out on.
	-/var/lib/dpkg/info	Leave out the really large subdirectory
				which contains no information not in the packages
	/var/cache/debconf	Keep all the debconf answered questions
	/var/lib/aptitude	Saves the package dependency choices
One also needs to consider the (usually minor) problems that may occur if a package on the crashed system was out-of-date. When restoring from the upstream Debian archive or CD, this package would be replaced by a newer version.

On the whole I prefer to have at least one "mondo" style backup of the system which gets me to a familiar working environment even while one is trying to bring the dead back to life. With external bootable USB this is quite feasible. (This is a shameless plug for my article in LG #140!)

Regards,

Kapil. --


Top    Back


Mulyadi Santosa [mulyadi.santosa at gmail.com]


Sun, 15 Jul 2007 22:50:48 +0700

hi...

joining a bit late, but better late than never... :)

> Oh, and just as a comparison: Rick's approach, for my system, results in
> a backup set that's just a hair under 25GB; a full backup is a bit over
> 53GB. Pretty significant.

It reminds me about a simple but useful practice related to make backup easier. If you compile a program from the source, make sure it's installed under certain directory, let's say /usr/local. Later, we could just make a symlink from /usr/bin (or any other common binary places) towards these binaries. Or, you could make yourself RPM/deb/whatever of these applications.

It brings two advantage IMHO:

1. you don't need to recompile after doing restore (probably after bad accident)

2. you know which binaries are brought by the default distribution installation/upgrade. Of course, you could do the same by using your package manager, but it takes time.

regards,

Mulyadi


Top    Back


Rick Moen [rick at linuxmafia.com]


Sun, 15 Jul 2007 10:36:31 -0700

Quoting Ben Okopnik ([email protected]):

> I see your point, and I agree that it can be a valuable approach - in
> fact, it's well worth considering in my own case. The problem is that a
> solution like this can't be generalized to the average user; most people
> have no idea of what on their system has been "custom-modified" vs.
> being tweaked by package installation, and chances are good that they've
> also forgotten anything unusual that they have tweaked. 

My own is experience is that there turned out to be only a few "surprise" locations -- the Mailman tree under /var/lib, MySQL's database files under /var/lib, and the CGIs under /usr/lib -- that only gradually occurred to me over about a week of occasionally pondering the problem and wandering the directory tree, looking around.

It helped that I always carefully leave "system" directories to distro control, with the exception of /etc, system CGIs, and Mailman configuration, and carefully make sure locally installed software went under /usr/local and not in the regular system tree.


Top    Back


Thomas Adam [thomas at edulinux.homeunix.org]


Sun, 15 Jul 2007 20:55:16 +0100

On Sun, Jul 15, 2007 at 10:50:48PM +0700, Mulyadi Santosa wrote:

> It reminds me about a simple but useful practice related to make
> backup easier. If you compile a program from the source, make sure
> it's installed under certain directory, let's say /usr/local. Later,
> we could just make a symlink from /usr/bin (or any other common binary

This is precisely what stow used to do before it was deprecated. This isn't really about making backups easier -- that location is where you should be installing programs you yourself have compiled.

> places) towards these binaries. Or, you could make yourself
> RPM/deb/whatever of these applications.

Better off just using debhelper with dh_make for that -- then it will install to /usr, where you can then use equivs and package holding to further that aim.

(And since you seem to be a Debian user, this is where deb-repack also helps.)

-- 
Thomas Adam
"He wants you back, he screams into the night air, like a fireman going
through a window that has no fire." -- Mike Myers, "This Poem Sucks".

Top    Back


Rick Moen [rick at linuxmafia.com]


Sun, 15 Jul 2007 14:06:11 -0700

Quoting Thomas Adam ([email protected]):

> "He wants you back, he screams into the night air, like a fireman going
> through a window that has no fire." -- Mike Myers, "This Poem Sucks".

You hard-hearted harbinger of haggis, you.

-- 
Cheers,                Re-elect Gore in '08.
Rick Moen              http://www.hyperorg.com/blogger/misc/gorespeech.html
[email protected]

Top    Back


Thomas Adam [thomas at edulinux.homeunix.org]


Sun, 15 Jul 2007 22:16:38 +0100

On Sun, Jul 15, 2007 at 02:06:11PM -0700, Rick Moen wrote:

> Quoting Thomas Adam ([email protected]):
> 
> > "He wants you back, he screams into the night air, like a fireman going
> > through a window that has no fire." -- Mike Myers, "This Poem Sucks".
> 
> You hard-hearted harbinger of haggis, you.

That's the one. :) "So I Married an Axe Murderer" is a terrible film, as was the poem; although I believe that was the point. :)

-- 
Thomas Adam
"He wants you back, he screams into the night air, like a fireman going
through a window that has no fire." -- Mike Myers, "This Poem Sucks".

Top    Back


Ben Okopnik [ben at linuxgazette.net]


Sun, 15 Jul 2007 18:42:23 -0400

On Sun, Jul 15, 2007 at 09:27:36AM +0530, Kapil Hari Paranjape wrote:

> Hello,
> 
> On Sat, 14 Jul 2007, Rick Moen wrote:
> > If you make sure every backup set has a catalougue of currently
> > installed package names, then that plus locally generated files are
> > literaly all that need be backed up.  Here is a complete list of
> > directories that do need backing up, on my server:
> 
> I would add the following to Rick's list:
> 
> 	/var/lib/dpkg		Saves a lot of config info (alternatives, diverts, ...)
> 				which "dpkg --get-selections" misses out on.
> 	-/var/lib/dpkg/info	Leave out the really large subdirectory
> 				which contains no information not in the packages
> 	/var/cache/debconf	Keep all the debconf answered questions
> 	/var/lib/aptitude	Saves the package dependency choices

Wouldn't this create problems by "misinforming" the package system about the current package state? Rick is saying "don't bring the packages along, reinstall them" - while the above (with the possible exception of /var/cache/debconf) will tell the system that they're already installed. It sounds to me like "apt-get" would just tell you "$blah is already the latest version" and exit for every single package that you had on the old system (while leaving those that hadn't been installed back then installable now.)

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top    Back


Thomas Adam [thomas at edulinux.homeunix.org]


Sun, 15 Jul 2007 23:46:14 +0100

On Sun, Jul 15, 2007 at 06:42:23PM -0400, Ben Okopnik wrote:

> It sounds to me like "apt-get" would just tell you "$blah is already the
> latest version" and exit for every single package that you had on the
> old system (while leaving those that hadn't been installed back then
> installable now.)

Yes, which is why you always:

dpkg --get-selections > ./some_file # Old machine.
dpkg --set-selections < ./some_file # New machine.
apt-get dselect-upgrade
Then under the hood those files in /var/lib/* get updated accordingly.

-- 
Thomas Adam
"He wants you back, he screams into the night air, like a fireman going
through a window that has no fire." -- Mike Myers, "This Poem Sucks".

Top    Back


Ben Okopnik [ben at linuxgazette.net]


Sun, 15 Jul 2007 18:56:07 -0400

On Sun, Jul 15, 2007 at 11:46:14PM +0100, Thomas Adam wrote:

> On Sun, Jul 15, 2007 at 06:42:23PM -0400, Ben Okopnik wrote:
> > It sounds to me like "apt-get" would just tell you "$blah is already the
> > latest version" and exit for every single package that you had on the
> > old system (while leaving those that hadn't been installed back then
> > installable now.)
> 
> Yes, which is why you always:
> 
> ``
> dpkg --get-selections > ./some_file # Old machine.
> dpkg --set-selections < ./some_file # New machine.

Ah.

> apt-get dselect-upgrade
> ''
> 
> Then under the hood those files in /var/lib/* get updated accordingly.

Wouldn't the first two lines take care of that already? I'm missing the purpose of copying /var/lib/ here.

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top    Back


Rick Moen [rick at linuxmafia.com]


Sun, 15 Jul 2007 16:02:52 -0700

Quoting Ben Okopnik ([email protected]):

> Wouldn't the first two lines take care of that already? I'm missing the
> purpose of copying /var/lib/ here.

As am I.


Top    Back


Thomas Adam [thomas at edulinux.homeunix.org]


Mon, 16 Jul 2007 00:09:55 +0100

On Sun, Jul 15, 2007 at 06:56:07PM -0400, Ben Okopnik wrote:

> Wouldn't the first two lines take care of that already? I'm missing the
> purpose of copying /var/lib/ here.

Yep, and as for copying /var/lib -- it wasn't I who suggested it, so I can't say. :)

-- 
Thomas Adam
"He wants you back, he screams into the night air, like a fireman going
through a window that has no fire." -- Mike Myers, "This Poem Sucks".

Top    Back


Kapil Hari Paranjape [kapil at imsc.res.in]


Mon, 16 Jul 2007 07:00:58 +0530

Hello,

Since I was the one who suggested copying /var/lib/dpkg (but leaving out /var/lib/dpkg/info) and /var/lib/aptitude, and since a number of people are wondering why ...

On Sun, 15 Jul 2007, Ben Okopnik wrote:

> Wouldn't the first two lines take care of that already? I'm missing the
> purpose of copying /var/lib/ here.

On Sun, 15 Jul 2007, Rick Moen wrote:

> As am I.

On Mon, 16 Jul 2007, Thomas Adam wrote:

> Yep, and as for copying /var/lib -- it wasn't I who suggested it, so I can't
> say.  :)

... let me explain.

The directory /var/lib/dpkg/alternatives contains the alternatives that you have chosen for the system by running update-alternatives.

The file /var/lib/dpkg/diversions lists all the diversions that you have created for your system by running dpkg-divert.

The file /var/lib/aptitude/pkgstates lists which packages were pulled in by automatic dependency checks.

So one way to restore your system if you have backed-up the directories /var/lib/dpkg and /var/lib/aptitude would be:

	dpkg --get-selections ....
	rm /var/lib/dpkg/status
	dpkg --set-selects ...
An alternate approach would be to backup /var/lib/dpkg/alternatives and /var/lib/dpkg/diversions. You can use "debfoster" to keep track of package dependency and backup its (much smaller) file /var/lib/debfoster/keepers.

Regards,

Kapil. --


Top    Back


Rick Moen [rick at linuxmafia.com]


Sun, 15 Jul 2007 19:24:31 -0700

Quoting Kapil Hari Paranjape ([email protected]):

> An alternate approach would be to backup /var/lib/dpkg/alternatives
> and /var/lib/dpkg/diversions. 

Kapil, thanks for reminding us of where data for the alternatives and diversions mechanisms reside -- which mechanisms I was aware of, but hadn't yet had occasion to use diversions on my systems, and I've made only relatively minimal use of alternatives. That's why I wasn't taking measures to collect and preserve any of that data.

Note: It wouldn't be sufficient to merely back up and restore /var/lib/dpkg/alternatives/* : That's a record kept by the update-alternatives utility as as result of its management of the /etc/alternatives/ symlink tree. You want to also get the contents of that tree back.

When I dealt with this matter before, I did not try to preserve any "alternatives" data as such, except through having a reference tarball of the prior /etc/ tree, on-hand for eyeball reference while reconstructing the prior machine's local configuration manually. Which approach has the advantage of always working and offering no possibility of unpleasant surprises -- and, honestly, the /etc/alternatives symlink tree is going to end up almost 100% correct if not totally so without any help: E.g., including vim in your package shopping list will result in vim replacing nvi as the system devault "vi" utility (/etc/alternatives/vi symlink), without your needing to take any other steps at all.

Anyway, given a preserved copy of /var/lib/dpkg/alternatives/* from the prior system, I believe running "update-alternatives --set" will correctly rebuild one's /etc/alternatives/ symlinks as desired, if I'm reading the manpage correctly.

By contrast, I believe it is perfectly sufficient to back up, preserve, and carry forward the /var/lib/dpkg/diversions file, which record appears to be self-contained.

> The file /var/lib/aptitude/pkgstates lists which packages were pulled
> in by automatic dependency checks.
[...]

> You can use "debfoster" to keep
> track of package dependency and backup its (much smaller) file
> /var/lib/debfoster/keepers.

I'm not sure that backing up and preserving dependency data is a useful thing to do. Do you think it is?

My working assumption -- and it seemed to prove valid when I used my aforementioned procedure to migrate my entire system -- is that your best tactic is to just build a minimal system, then hand the package tool your list of packages (carried over from a backup set) as a shopping list, and let it work out the dependencies on its own, fetching whatever are the current requirments, which may be quite different from what they used to be, in your machine's prior incarnation.

Strictly speaking, actually, what I ended up doing was building a minimal system, fetching a copy of the prior installed-packages list, manually snipping out all library packages using a text editor, and only then feeding that list to apt as a shopping list. Doing so avoided a large number of messy and time-wasting error messages resulting from the fact that requested package foo no longer required library bar but instead differently-named library baz.

My general approach, then, is: Tell the package tool what I want, and let it work out the details without my needing to micromanage. On a system with good tools and a strong package policy (Debian, Ubuntu, etc.), this works well.


Top    Back


Kapil Hari Paranjape [kapil at imsc.res.in]


Mon, 16 Jul 2007 09:10:57 +0530

Hello,

On Sun, 15 Jul 2007, Rick Moen wrote:

> Anyway, given a preserved copy of /var/lib/dpkg/alternatives/* from the
> prior system, I believe running "update-alternatives --set" will correctly
> rebuild one's /etc/alternatives/ symlinks as desired, if I'm reading
> the manpage correctly.

That is right. I didn't say anything about this since /etc was also being backed up. Your analysis of what needs to be backed up from /var/lib/dpkg is on the nose as far as I can see.

> I'm not sure that backing up and preserving dependency data is a
> useful thing to do.  Do you think it is?

(After some cogitation I now realise that my reasons are as given below. Thanks for making me think about this!)

I preserve this not as a way to re-install the system but as a way to preserve my usual way of managing packages.

To re-install the system, I would just use "dpkg --get-selections" and if the upstream has evolved considerably, then resolve the dependencies that arise.

However, as I test/tryout a number of packages, I need a good package management tool (p-m-t) for a running system. This tool needs to have dependency information regarding which alternative out of the various packages providing a dependency I prefer. Depending on the p-m-t the data is stored in different places in /var/lib and I would want to back that up.

This is not very different from backing up /var/lib/mysql and other data which is stored by the system for shared use by different users. (Perhaps that is close to the definition of /var/lib in FHS).

So perhaps the correct instruction for backup would be:

	Look closely at /var/lib for data created by users
	that is stored there instead of in their home
	directories because it is shared.
 
	If such data has been generated automatically by the system
	using content available elsewhere (such as most contents of
	/var/lib/dpkg and /var/lib/texmf) then you should/could skip it. 
I believe that the latter class should usually be in /var/cache but this is often a somewhat fine distinction (and hence a source of flame-wars :-() .

Regards,

Kapil. --


Top    Back


Karl-Heinz Herrmann [khh at khherrmann.de]


Wed, 18 Jul 2007 21:23:45 +0200

On Sat, 14 Jul 2007 13:36:07 -0400 Ben Okopnik <[email protected]> wrote:

> "backuppc" requires a hard
> drive (well, maybe. I'll have to play with it and find out.) Any other

FLASH drive should do :-) just make it ext2/3 not VFAT

K.-H.


Top    Back


Ben Okopnik [ben at linuxgazette.net]


Fri, 20 Jul 2007 21:44:48 -0400

On Wed, Jul 18, 2007 at 09:23:45PM +0200, Karl-Heinz Herrmann wrote:

> On Sat, 14 Jul 2007 13:36:07 -0400
> Ben Okopnik <[email protected]> wrote:
> 
> > "backuppc" requires a hard
> > drive (well, maybe. I'll have to play with it and find out.) Any other
> 
> FLASH drive should do :-) just make it ext2/3 not VFAT
tar cvjSpf - -T file_list > /mnt/flash/backup-`date +%s`.tbz
:)

That would take care of it even if it was VFAT - but yeah, that's too much of a pain. You're right; it's much better to stick with a real filesystem.

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top    Back