[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090313163531.GA10685@us.ibm.com>
Date: Fri, 13 Mar 2009 11:35:31 -0500
From: "Serge E. Hallyn" <serue@...ibm.com>
To: Cedric Le Goater <legoater@...e.fr>
Cc: Alexey Dobriyan <adobriyan@...il.com>, linux-api@...r.kernel.org,
containers@...ts.linux-foundation.org, hpa@...or.com,
linux-kernel@...r.kernel.org,
Dave Hansen <dave@...ux.vnet.ibm.com>, linux-mm@...ck.org,
viro@...iv.linux.org.uk, mingo@...e.hu, mpm@...enic.com,
tglx@...utronix.de, torvalds@...ux-foundation.org,
Andrew Morton <akpm@...ux-foundation.org>, xemul@...nvz.org
Subject: Re: How much of a mess does OpenVZ make? ;) Was: What can OpenVZ
do?
Quoting Cedric Le Goater (legoater@...e.fr):
>
> > No, what you're suggesting does not suffice.
>
> probably. I'm still trying to understand what you mean below :)
>
> Man, I hate these hierarchicals pid_ns. one level would have been enough,
> just one vpid attribute in 'struct pid*'
Well I don't mind - temporarily - saying that nested pid namespaces
are not checkpointable. It's just that if we're going to need a new
syscall anyway, then why not go ahead and address the whole problem?
It's not hugely more complicated, and seems worth it.
> > Call
> > (5591,3,1) the task knows as 5591 in the init_pid_ns, 3 in a child pid
> > ns, and 1 in grandchild pid_ns created from there. Now assume we are
> > checkpointing tasks T1=(5592,1), and T2=(5594,3,1).
> >
> > We don't care about the first number in the tuples, so they will be
> > random numbers after the recreate.
>
> yes.
>
> > But we do care about the second numbers.
>
> yes very much and we need a way set these numbers in alloc_pid()
>
> > But specifying CLONE_NEWPID while recreating the process tree
> > in userspace does not allow you to specify the 3 in (5594,3,1).
>
> I haven't looked closely at hierarchical pid namespaces but as we're
> using a an array of pid indexed but the pidns level, i don't see why
> it shouldn't be possible. you might be right.
>
> anyway, I think that some CLONE_NEW* should be forbidden. Daniel should
> send soon a little patch for the ns_cgroup restricting the clone flags
> being used in a container.
Uh, that feels a bit over the top. We want to make this
uncheckpointable (if it remains so), not prevent the whole action.
After all I may be running a container which I don't plan on ever
checkpointing, and inside that container running a job which i do
want to migrate.
So depending on if we're doing the Dave or the rest-of-the-world
way :), we either clear_bit(pidns->may_checkpoint) on the parent
pid_ns when a child is created, or we walk every task being
checkpointed and make sure they each are in the same pid_ns. Doesn't
that suffice?
-serge
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists