[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4CD26270.5050906@kernel.org>
Date: Thu, 04 Nov 2010 08:36:16 +0100
From: Tejun Heo <tj@...nel.org>
To: Nathan Lynch <ntl@...ox.com>
CC: Christoph Hellwig <hch@....de>,
Oren Laadan <orenl@...columbia.edu>,
ksummit-2010-discuss@...ts.linux-foundation.org,
linux-kernel@...r.kernel.org, kapil@....neu.edu, gene@....neu.edu
Subject: Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch
Hello,
On 11/04/2010 02:47 AM, Nathan Lynch wrote:
>> In this case whitelisting the allowed
>> state by requiring special APIs for all I/O (or even just standard
>> APIs as long as they are supposed by the C/R lib you're linked against)
>> is the more pragmatic, and I think faithful aproach.
>
> I don't think users will go for it. They'll continue to use dodgy
> out-of-tree kernel modules and/or LD_PRELOAD hacks instead of porting
> their applications to a new library. I think a C/R library is an
> "ideal" solution, but it's one that nobody would use - especially in
> HPC, unless the library somehow provides better performance.
I hear that there are plans to integrate one of the userland
snapshotting implementations with HPC workload manager. ISTR the
combination to be condor + dmtcp but not sure. I think things like
that make a lot of sense. Scientists writing programs for HPC
clusters already work in given frameworks and what those applications
do and how to recover are pretty well confined/defined. If you
integrate snapshotting with such frameworks, it becomes pretty easy
for both the admins and users.
I'll talk about other issues in the reply to Oren's email.
Thanks.
--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists