[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <987664A83D2D224EAE907B061CE93D53016485FE6E@orsmsx505.amr.corp.intel.com>
Date: Fri, 5 Nov 2010 04:57:33 -0700
From: "Luck, Tony" <tony.luck@...el.com>
To: Kapil Arya <kapil@....neu.edu>, Oren Laadan <orenl@...columbia.edu>
CC: "ksummit-2010-discuss@...ts.linux-foundation.org"
<ksummit-2010-discuss@...ts.linux-foundation.org>,
Gene Cooperman <gene@....neu.edu>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [Ksummit-2010-discuss] checkpoint-restart: naked patch
> Oren noted that sometimes it's important to stop the process only
> for a few milliseconds while one checkpoints. In DMTCP, we do that
> by configuring with --enable-forked-checkpointing. This causes us
> to fork a child process taking advantage of copy-on-write and then
> checkpoint the memory pages of the child while the parent continues
> to execute.
Interesting ... but while the process is only stopped for the duration
of the fork, it may be taking COW faults on almost every page it
touches. I think this will not work well for large HPC applications
that allocate most of physical memory as anonymous pages for the
application. It may even result in an OOM kill if you don't complete
the checkpoint of the child and have it exit in a timely manner.
-Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists