[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090415195629.GD26994@x200.localdomain>
Date: Wed, 15 Apr 2009 23:56:29 +0400
From: Alexey Dobriyan <adobriyan@...il.com>
To: Oren Laadan <orenl@...columbia.edu>
Cc: containers@...ts.osdl.org, Dave Hansen <dave@...ux.vnet.ibm.com>,
"Serge E. Hallyn" <serue@...ibm.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Linux-Kernel <linux-kernel@...r.kernel.org>,
Ingo Molnar <mingo@...e.hu>
Subject: C/R without "leaks" (was: Re: Creating tasks on restart: userspace
vs kernel)
> Again, so to checkpoint one task in the topmost pid-ns you need to
> checkpoint (if at all possible) the entire system ?!
One more argument to not allow "leaks" and checkpoint whole container,
no ifs, buts and woulditbenices.
Just to clarify, C/R with "leak" is for example when process has separate
pidns, but shares, for example, netns with other process not involved in
checkpoint.
If you allow this, you lose one important property of checkpoint part,
namely, almost everything is frozen. Losing this property means suddenly
much more stuff is alive during dump and you has to account to more stuff
when checkpointing. You effectively checkpointing on live data structures
and there is no guarantee you'll get it right.
Example 1: utsns is shared with the rest of the world.
utsns content is modifiable only by tasks (current->nsproxy->uts_ns).
Consequently, someone can modify utsns content while you're dumping it
if you allow "leaks".
Did you take precautions? Where?
static int cr_write_utsns(struct cr_ctx *ctx, struct uts_namespace *uts_ns)
{
struct cr_hdr h;
struct cr_hdr_utsns *hh;
int domainname_len;
int nodename_len;
int ret;
h.type = CR_HDR_UTSNS;
h.len = sizeof(*hh);
hh = cr_hbuf_get(ctx, sizeof(*hh));
if (!hh)
return -ENOMEM;
nodename_len = strlen(uts_ns->name.nodename) + 1;
domainname_len = strlen(uts_ns->name.domainname) + 1;
hh->nodename_len = nodename_len;
hh->domainname_len = domainname_len;
ret = cr_write_obj(ctx, &h, hh);
cr_hbuf_put(ctx, sizeof(*hh));
if (ret < 0)
return ret;
ret = cr_write_string(ctx, uts_ns->name.nodename, nodename_len);
if (ret < 0)
return ret;
ret = cr_write_string(ctx, uts_ns->name.domainname, domainname_len);
return ret;
}
You should take uts_sem.
Example 2: ipcns is shared with the rest of the world
Consequently, shm segment is visible outside and live. Someone already
shmatted to it. What will end up in shm segment content? Anything.
You should check struct file refcount or something and disable attaching
while dumping or something.
Moral: Every time you do dump on something live you get complications.
Every single time.
There are sockets and live netns as the most complex example. I'm not
prepared to describe it exactly, but people wishing to do C/R with
"leaks" should be very careful with their wishes.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists