[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20111019182228.GJ25124@google.com>
Date: Wed, 19 Oct 2011 11:22:28 -0700
From: Tejun Heo <tj@...nel.org>
To: Pavel Emelyanov <xemul@...allels.com>
Cc: Cyrill Gorcunov <gorcunov@...nvz.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Andrey Vagin <avagin@...allels.com>,
James Bottomley <jbottomley@...allels.com>,
Glauber Costa <glommer@...allels.com>,
"H. Peter Anvin" <hpa@...or.com>, Ingo Molnar <mingo@...e.hu>,
Dave Hansen <dave@...ux.vnet.ibm.com>,
"Eric W. Biederman" <ebiederm@...ssion.com>,
Daniel Lezcano <dlezcano@...ibm.com>,
Alexey Dobriyan <adobriyan@...il.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Oleg Nesterov <oleg@...hat.com>
Subject: Re: [patch 5/5] elf: Add support for loading ET_CKPT files
Hello, Pavel.
On Wed, Oct 19, 2011 at 01:03:17PM +0400, Pavel Emelyanov wrote:
> > I don't think this is a good idea. We already have most of interface
> > necessary for restoring task state and there's no need to put it into
> > the kernel as one piece. If there's something which can't be done
> > from userland using existing interfaces, let's please discuss what
> > they are and whether they can be resolved differently first.
>
> The rest API we will require will look too special-purpose (like Cyrill
> mentioned the ability to set the mm's fields such as start_code, etc).
I find that quite difficult to agree with. We're talking about some
minor additions to prctl against updating exec path to do something it
was never designed to do + new binary format.
> Besides, putting a parasite code to restore the memory state will drive
> us into certain limitations. E.g. - what address range in the target task
> address space should the parasite occupy? What if we fail in finding one,
> how should we act next?
How is that different from snapshotting time? Unless the address is
completely packed, I can't see how that can be a problem. In the
worst case, you can use the parasite to map what's not overlapping and
then let the ptracer restore the overlapping areas after parasite is
done.
> > The exec path is very intricate as it is and it would be very easy to
> > introduce security or other bugs by altering its basic behavior.
>
> Can you elaborate a bit more on this? "Other" bugs can be introduced by
> any piece of code injected into the kernel, but what about security?
exec is a pretty important security boundary. A lot of resources (fd,
VMAs, credentails) have security-sensitive policies forced across exec
boundary and there are enough places which depends on the all
resetting property of exec (e.g. no other user of thread-group wide
resources).
> > exec presumes destruction of (most of) the current process and all its
> > other threads and replacing them w/ fresh states from an executable.
>
> This is *exactly* what is expected from the restoration process.
Umm.... so why does the patch skip zapping? And if you are gonna be
zapping, how are you gonna do multiple threads?
> > I see that you removed zapping to allow restoring multi-threaded
> > process, which seems quite scary to me. It might be okay, I don't
> > know, but IMHO it just isn't a very good idea to introduce such
> > variation to basic exec behavior for this rather narrow use case.
>
> Why? Can you describe your thought on this more, maybe I'm really missing
> smth important.
Because it breaks one of the very basic assumptions - there are no
other tasks in the thread group and thus resources usually shared
among threadgroup can be assumed to be exclusive.
> > In addition, leaving it alone doesn't really solve multi-threaded
> > restoring, does it?
>
> So your main concern is about multy-threaded tasks, right? I think we can
> prepare the exec handler capable to show how this can look like.
I'm strongly against that direction regardless of implementation
details.
> > The current code doesn't seem to be doing anything but if you're gonna add
> > it later, how are you gonna synchronize other threads?
>
> Synchronize what? If we're providing to a userspace a way to put things right
> it's up to userspace to use it correctly. Like we saw this wit pid namespaces -
> there's are plenty of ways how to kill the app with CLONE_NEWPID clone flag, but
> the intention of one is not the be 100% fool-proof, but to provide an ability
> to do what userspace need.
To shared kernel data structures.
> One of the abilities to restore multy-threaded task can be - clone threads from
> the userspace, then wait for them to prepare themselves the way they need it,
> then the threads go to sleep and the leader one calls execve. Thus the userspace
> state consistency is OK.
If you're gonna let userspace do it, why bother with in-kernel thing
at all?
> > * It's still calling exec_mmap(), so the exec'ing thread will be on
> > the new mm while other threads would still be on the old one.
>
> Why do you think it's a problem?
Ummm.... they aren't on the same address space? How can that be okay?
It's not only wrong from CR POV. The kernel disallows
CLONE_THREAD/SIGHAND w/o CLONE_VM and depend on that assumption,
you're breaking that.
> > * There are operations which assume that the calling task is
> > de-threaded and thus has exclusive access to data structures which
> > can be shared among the thread group (e.g. signal struct). How are
> > accesses to them synchronized?
>
> If we're talking about keeping the userspace state solid, then stopping the
> other threads solves this.
Again, there are kernel exclusivity assumptions.
> That's a pity. As I stated above, we'll try to demonstrate the exec handler
> capable to restore the multy-threaded APP and send the 2nd RFC. Can you answer
> my questions above so we keep your concerns in mind while doing this, please.
IMHO, this is a fundamentally broken approach which isn't even
necessary. So, FWIW, NACK.
Thank you.
--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists