[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4909FDD3.5090806@cs.columbia.edu>
Date: Thu, 30 Oct 2008 14:32:51 -0400
From: Oren Laadan <orenl@...columbia.edu>
To: Louis.Rilling@...labs.com
CC: Andrey Mirkin <major@...nvz.org>,
Dave Hansen <dave@...ux.vnet.ibm.com>,
"Serge E. Hallyn" <serue@...ibm.com>,
Cedric Le Goater <clg@...ibm.com>,
Daniel Lezcano <dlezcano@...ibm.com>,
containers@...ts.linux-foundation.org, linux-kernel@...r.kernel.org
Subject: Re: [Devel] Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart
Louis Rilling wrote:
> On Thu, Oct 30, 2008 at 01:45:25PM -0400, Oren Laadan wrote:
>>
>> Louis Rilling wrote:
>>> In Kerrighed this is kernel-based, and will remain kernel-based because we
>>> checkpoint a distributed task tree, and want to restart it as mush as possible
>>> with the same distribution. The distributed protocol used for restart is
>>> currently too fragile and complex to rely on customized user-space
>>> implementations. That said, if someone brings very good arguments in favor of
>>> userspace implementations, we might consider changing this.
>> Zap also has distributed checkpoint which does not require strict
>> kernel-side ordering. Do you need that because you do SSI ?
>
> Yes. Tasks from different nodes have parent-children, session leader, etc.
> relationships, and the distributed management of struct pid lifecycle is a bit
> touchy too. By the way, splitting the checkpoint image in one file for each task
> helps us a lot to make restart parallel, because it is more efficient for the file
> system to handle parallel reads of different files from different nodes than
> parallel reads on a single file descriptor from different nodes.
You can also make parallel restart work with the single stream, without
much effort. Particularly if you store everything on the file system.
In both cases, the limiting factor is shared resources - where one task
cannot proceed with checkpoint because it waits for another task to first
(re)create that resource.
[...]
Oren.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists