lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 30 Oct 2008 19:14:18 +0100
From:	Louis Rilling <Louis.Rilling@...labs.com>
To:	Oren Laadan <orenl@...columbia.edu>
Cc:	Andrey Mirkin <major@...nvz.org>,
	Dave Hansen <dave@...ux.vnet.ibm.com>,
	"Serge E. Hallyn" <serue@...ibm.com>,
	Cedric Le Goater <clg@...ibm.com>,
	Daniel Lezcano <dlezcano@...ibm.com>,
	containers@...ts.linux-foundation.org, linux-kernel@...r.kernel.org
Subject: Re: [Devel] Re: [PATCH 0/9] OpenVZ kernel based
	checkpointing/restart

On Thu, Oct 30, 2008 at 01:45:25PM -0400, Oren Laadan wrote:
> 
> 
> Louis Rilling wrote:
> > In Kerrighed this is kernel-based, and will remain kernel-based because we
> > checkpoint a distributed task tree, and want to restart it as mush as possible
> > with the same distribution. The distributed protocol used for restart is
> > currently too fragile and complex to rely on customized user-space
> > implementations. That said, if someone brings very good arguments in favor of
> > userspace implementations, we might consider changing this.
> 
> Zap also has distributed checkpoint which does not require strict
> kernel-side ordering. Do you need that because you do SSI ?

Yes. Tasks from different nodes have parent-children, session leader, etc.
relationships, and the distributed management of struct pid lifecycle is a bit
touchy too. By the way, splitting the checkpoint image in one file for each task
helps us a lot to make restart parallel, because it is more efficient for the file
system to handle parallel reads of different files from different nodes than
parallel reads on a single file descriptor from different nodes.

> 
> > 
> > Without taking distributed restart into account, I also tend to prefer
> > kernel-based, mainly for two (not so strong) reasons:
> > 1) this prevents userspace from doing weird things, like changing the task tree
> > and let the kernel detect it and deal with the mess this creates (think about
> > two threads being restarted in separate processes that do not even share their
> > parents). But one can argue that userspace can change the checkpoint image as
> > well, so that the kernel must check for such weird things anyway.
> 
> I don't really buy this argument. First, as you say, user can change
> the checkpoint image file. Second, you can verify in the kernel that
> the real relationships of the processes match those specified (and
> expected from) the image file. That's pretty straightforward.
> 
> > 2) restart will be more efficient with respect to shared objects.
> 
> Can you elaborate on this ?  In what sense "more efficient" ?
> 
> Note that the topic in question is not whether to do the entire restart
> from user space (and I argue that most work should be done in the kernel),
> but rather whether process creation (and only that) should be done in
> kernel or user space.

See my answer to Dave.

> 
> Quick thoughts of pros/cons of each approach are:
> 
> user space:
> 
> + re-use existing api (fork)
> + easier to debug
> + will allow 'handmade' resources restart: it was mentioned before that
>   one may want to reattach stdout to a different place after restart; a
>   user based restart of processes can make this much easier: e.g. the
>   user process can create the alternative resources, give them to the
>   kernel and only then call sys_restart)
> + arch-independent code
> 
> - a bit slower than in kernel space
> - requires a clone-with-specific-pid syscall or interface
> 
> kernel space:
> 
> + a bit easier to control everything
> + a bit faster than user space
> + no need for user-visible interface for clone-with-...
> 
> - arch-dependent code
> - needs special code to fight 'fork-bomb'
> 
> So, I'm not convinced, and I even think there may be room to both, for
> the time being. I volunteer to support the user-space alternative while
> we make up our minds.

Yes, I hope that investigating both approaches will give us stronger arguments.

Louis

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes

Download attachment "signature.asc" of type "application/pgp-signature" (190 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ