[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20081030181418.GO15171@hawkmoon.kerlabs.com>
Date: Thu, 30 Oct 2008 19:14:18 +0100
From: Louis Rilling <Louis.Rilling@...labs.com>
To: Oren Laadan <orenl@...columbia.edu>
Cc: Andrey Mirkin <major@...nvz.org>,
Dave Hansen <dave@...ux.vnet.ibm.com>,
"Serge E. Hallyn" <serue@...ibm.com>,
Cedric Le Goater <clg@...ibm.com>,
Daniel Lezcano <dlezcano@...ibm.com>,
containers@...ts.linux-foundation.org, linux-kernel@...r.kernel.org
Subject: Re: [Devel] Re: [PATCH 0/9] OpenVZ kernel based
checkpointing/restart
On Thu, Oct 30, 2008 at 01:45:25PM -0400, Oren Laadan wrote:
>
>
> Louis Rilling wrote:
> > In Kerrighed this is kernel-based, and will remain kernel-based because we
> > checkpoint a distributed task tree, and want to restart it as mush as possible
> > with the same distribution. The distributed protocol used for restart is
> > currently too fragile and complex to rely on customized user-space
> > implementations. That said, if someone brings very good arguments in favor of
> > userspace implementations, we might consider changing this.
>
> Zap also has distributed checkpoint which does not require strict
> kernel-side ordering. Do you need that because you do SSI ?
Yes. Tasks from different nodes have parent-children, session leader, etc.
relationships, and the distributed management of struct pid lifecycle is a bit
touchy too. By the way, splitting the checkpoint image in one file for each task
helps us a lot to make restart parallel, because it is more efficient for the file
system to handle parallel reads of different files from different nodes than
parallel reads on a single file descriptor from different nodes.
>
> >
> > Without taking distributed restart into account, I also tend to prefer
> > kernel-based, mainly for two (not so strong) reasons:
> > 1) this prevents userspace from doing weird things, like changing the task tree
> > and let the kernel detect it and deal with the mess this creates (think about
> > two threads being restarted in separate processes that do not even share their
> > parents). But one can argue that userspace can change the checkpoint image as
> > well, so that the kernel must check for such weird things anyway.
>
> I don't really buy this argument. First, as you say, user can change
> the checkpoint image file. Second, you can verify in the kernel that
> the real relationships of the processes match those specified (and
> expected from) the image file. That's pretty straightforward.
>
> > 2) restart will be more efficient with respect to shared objects.
>
> Can you elaborate on this ? In what sense "more efficient" ?
>
> Note that the topic in question is not whether to do the entire restart
> from user space (and I argue that most work should be done in the kernel),
> but rather whether process creation (and only that) should be done in
> kernel or user space.
See my answer to Dave.
>
> Quick thoughts of pros/cons of each approach are:
>
> user space:
>
> + re-use existing api (fork)
> + easier to debug
> + will allow 'handmade' resources restart: it was mentioned before that
> one may want to reattach stdout to a different place after restart; a
> user based restart of processes can make this much easier: e.g. the
> user process can create the alternative resources, give them to the
> kernel and only then call sys_restart)
> + arch-independent code
>
> - a bit slower than in kernel space
> - requires a clone-with-specific-pid syscall or interface
>
> kernel space:
>
> + a bit easier to control everything
> + a bit faster than user space
> + no need for user-visible interface for clone-with-...
>
> - arch-dependent code
> - needs special code to fight 'fork-bomb'
>
> So, I'm not convinced, and I even think there may be room to both, for
> the time being. I volunteer to support the user-space alternative while
> we make up our minds.
Yes, I hope that investigating both approaches will give us stronger arguments.
Louis
--
Dr Louis Rilling Kerlabs
Skype: louis.rilling Batiment Germanium
Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes
http://www.kerlabs.com/ 35700 Rennes
Download attachment "signature.asc" of type "application/pgp-signature" (190 bytes)
Powered by blists - more mailing lists