linux-kernel - Re: checkpoint/restart ABI

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 12 Aug 2008 09:49:05 -0500
From:	"Serge E. Hallyn" <serue@...ibm.com>
To:	Peter Chubb <peterc@...ato.unsw.edu.au>
Cc:	Jeremy Fitzhardinge <jeremy@...p.org>,
	Dave Hansen <dave@...ux.vnet.ibm.com>,
	Arnd Bergmann <arnd@...db.de>,
	containers@...ts.linux-foundation.org,
	Theodore Tso <tytso@....edu>, linux-kernel@...r.kernel.org
Subject: Re: checkpoint/restart ABI

Quoting Peter Chubb (peterc@...ato.unsw.edu.au):
> >>>>> "Jeremy" == Jeremy Fitzhardinge <jeremy@...p.org> writes:
> 
> Jeremy> Dave Hansen wrote:
> >> Arnd, Jeremy and Oren,
> >> 
> 
> 
> Jeremy>    * multiple processes * pipes * UNIX domain sockets * INET
> Jeremy> sockets (both inter and intra machine) * unlinked open files *
> Jeremy> checkpointing file content * closed files (ie, files which
> Jeremy> aren't currently open, but will be soon, esp tmp files) *
> Jeremy> shared memory * (Peter, what have I forgotten?)
> 
> File sharing; multiple threads with wierd sharing arrangements (think:
> clone with various parameters, followed by exec in some of the threads
> but not others); MERT/system-V shared memory, semaphores and message
> queues; devices (audio, framebuffer, etc), HugeTLBFS, numa issues
> (pinning, memory layout), processes being debugged (so,
> checkpoint.restart a gdb/target pair), futexes, etc., etc.  Linux
> process state keeps expanding.
> 
> Jeremy> Having gone through this before, I don't think an all-kernel
> Jeremy> solution can work except for the most simple cases.
> 
> I agree ... it's better to put mechanisms into the kernel that can
> then be used by a user-space programme to actually do the
> checkpointing and restarting.
> 
> Beefing up ptrace or fixing /proc to be a real debugging interface
> would be a start ... when you can get at *all* the info you need,

Except we don't really want to export all the info you need for a
complete restartable checkpoint.  And especially not make it
generally writable.

We have also started down that path using ptrace (see cryo, at
git://git.sr71.net/~hallyn/cryodev.git).

Right before the containers mini-summit, where the general agreement was
that a complete in-kernel solution ought to be pursued, I had tried
a restart using a binary format that read a checkpoint file and used
cryo (userspace using ptrace) for the rest of the restart, only
because there was no other reasonable way to set tsk->did_exec on
restart.

> quickly and easily, the userspace checkpoint falls out fairly
> naturally.  You still have to work out an extensible file format to
> store stuff, and how to restore all that state you've so lovingly
> collected.
> 
> Jeremy> Lightweight filesystem checkpointing, such as btrfs provides,
> Jeremy> would seem like a powerful mechanism for handling a lot of the
> Jeremy> filesystem state problems.  It would have been useful when we
> Jeremy> did this...
> 
> And how!  saving bits of files was very timeconsuming.

Yes, we're looking forward to using btrfs' snapshots :)

-serge
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/