[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <52860B6D.4090208@parallels.com>
Date: Fri, 15 Nov 2013 15:54:21 +0400
From: Stanislav Kinsbursky <skinsbursky@...allels.com>
To: "Eric W. Biederman" <ebiederm@...ssion.com>
CC: Jeff Layton <jlayton@...hat.com>,
Greg KH <gregkh@...uxfoundation.org>,
<linux-kernel@...r.kernel.org>, <linux-fsdevel@...r.kernel.org>,
<linux-nfs@...r.kernel.org>, <devel@...nvz.org>, <oleg@...hat.com>,
<bfields@...ldses.org>, <bharrosh@...asas.com>
Subject: Re: call_usermodehelper in containers
15.11.2013 15:03, Eric W. Biederman пишет:
> Stanislav Kinsbursky <skinsbursky@...allels.com> writes:
>
>> 12.11.2013 17:30, Jeff Layton пишет:
>>> On Tue, 12 Nov 2013 17:02:36 +0400
>>> Stanislav Kinsbursky <skinsbursky@...allels.com> wrote:
>>>
>>>> 12.11.2013 15:12, Jeff Layton пишет:
>>>>> On Mon, 11 Nov 2013 16:47:03 -0800
>>>>> Greg KH <gregkh@...uxfoundation.org> wrote:
>>>>>
>>>>>> On Mon, Nov 11, 2013 at 07:18:25AM -0500, Jeff Layton wrote:
>>>>>>> We have a bit of a problem wrt to upcalls that use call_usermodehelper
>>>>>>> with containers and I'd like to bring this to some sort of resolution...
>>>>>>>
>>>>>>> A particularly problematic case (though there are others) is the
>>>>>>> nfsdcltrack upcall. It basically uses call_usermodehelper to run a
>>>>>>> program in userland to track some information on stable storage for
>>>>>>> nfsd.
>>>>>>
>>>>>> I thought the discussion at the kernel summit about this issue was:
>>>>>> - don't do this.
>>>>>> - don't do it.
>>>>>> - if you really need to do this, fix nfsd
>>>>>>
>>>>>
>>>>> Sorry, I couldn't make the kernel summit so I missed that discussion. I
>>>>> guess LWN didn't cover it?
>>>>>
>>>>> In any case, I guess then that we'll either have to come up with some
>>>>> way to fix nfsd here, or simply ensure that nfsd can never be started
>>>>> unless root in the container has a full set of a full set of
>>>>> capabilities.
>>>>>
>>>>> One sort of Rube Goldberg possibility to fix nfsd is:
>>>>>
>>>>> - when we start nfsd in a container, fork off an extra kernel thread
>>>>> that just sits idle. That thread would need to be a descendant of the
>>>>> userland process that started nfsd, so we'd need to create it with
>>>>> kernel_thread().
>>>>>
>>>>> - Have the kernel just start up the UMH program in the init_ns mount
>>>>> namespace as it currently does, but also pass the pid of the idle
>>>>> kernel thread to the UMH upcall.
>>>>>
>>>>> - The program will then use /proc/<pid>/root and /proc/<pid>/ns/* to set
>>>>> itself up for doing things properly.
>>>>>
>>>>> Note that with this mechanism we can't actually run a different binary
>>>>> per container, but that's probably fine for most purposes.
>>>>>
>>>>
>>>> Hmmm... Why we can't? We can go a bit further with userspace idea.
>>>>
>>>> We use UMH some very limited number of user programs. For 2, actually:
>>>> 1) /sbin/nfs_cache_getent
>>>> 2) /sbin/nfsdcltrack
>>>>
>>>
>>> No, the kernel uses them for a lot more than that. Pretty much all of
>>> the keys API upcalls use it. See all of the callers of
>>> call_usermodehelper. All of them are running user binaries out of the
>>> kernel, and almost all of them are certainly broken wrt containers.
>>>
>>>> If we convert them into proxies, which use /proc/<pid>/root and /proc/<pid>/ns/*, this will allow us to lookup the right binary.
>>>> The only limitation here is presence of this "proxy" binaries on "host".
>>>>
>>>
>>> Suppose I spawn my own container as a user, using all of this spiffy
>>> new user namespace stuff. Then I make the kernel use
>>> call_usermodehelper to call the upcall in the init_ns, and then trick
>>> it into running my new "escape_from_namespace" program with "real" root
>>> privileges.
>>>
>>> I don't think we can reasonably assume that having the kernel exec an
>>> arbitrary binary inside of a container is safe. Doing so inside of the
>>> init_ns is marginally more safe, but only marginally so...
>>>
>>>> And we don't need any significant changes in kernel.
>>>>
>>>> BTW, Jeff, could you remind me, please, why exactly we need to use UMH to run the binary?
>>>> What are this capabilities, which force us to do so?
>>>>
>>>
>>> Nothing _forces_ us to do so, but upcalls are very difficult to handle,
>>> and UMH has a lot of advantages over a long-running daemon launched by
>>> userland.
>>>
>>> Originally, I created the nfsdcltrack upcall as a running daemon called
>>> nfsdcld, and the kernel used rpc_pipefs to communicate with it.
>>>
>>> Everyone hated it because no one likes to have to run daemons for
>>> infrequently used upcalls. It's a pain for users to ensure that it's
>>> running and it's a pain to handle when it isn't. So, I was encouraged
>>> to turn that instead into a UMH upcall.
>>>
>>> But leaving that aside, this problem is a lot larger than just nfsd. We
>>> have a *lot* of UMH upcalls in the kernel, so this problem is more
>>> general than just "fixing" nfsd's.
>>>
>>
>> Ok. So we are talking about generic approach to UMH support in a container (and/or namespace).
>>
>> Actually, as far as I can see, there are more that one aspect, which is not supported.
>> One one them is executing of the right binary. Another one is
>> capabilities (and maybe there are more, like user namespaces), but I
>> don't really care about them for now.
>> Executing the right binary, actually, is not about namespaces at all. This is about lookup implementation in VFS (do_execve_common).
>
>
>
>> Would be great to unshare FS for forked UHM kthread and swap it to
>> desired root. This will solve the problem with proper lookup. However,
>> as far as I understand, this approach is not welcome by the community.
>
> I don't understand that one. Having a preforked thread with the proper
> environment that can act like kthreadd in terms of spawning user mode
> helpers works and is simple. The only downside I can see is that there
> is extra overhead.
>
What do you mean by "simple" here? Simple to implement?
We already have a preforked thread, called "UMH", used exactly for this purpose.
And, if I'm not mistaken, we are trying to discuss, how to adapt existent infrastructure for namespaces, don't we?
> Beyond that though for the user mode helpers spawned to populate
> security keys it is not clear which context they should be run in,
> even if we do have kernel threads.
>
Regardless of the context itself, we need a way to pass it to kernel thread and to put kernel thread in this context. Or I'm missing something?
>> This problem, probably, can be solved by constructing full binary path
>> (i.e. not in a container, but in kernel thread root context) in UMH
>> "init" callack. However, this will help only is the dentry is
>> accessible from "init" root. Which is usually no true in case on mount
>> namespaces, if I understand them right.
>
> You are correct it can not be assumed that what is visible in one mount
> namespace is visible in another. And of course in addition to picking
> the correct binary to run you have to set up a proper environment for
> that binary to run in. It may be that it's configuration file is only
> avaiable at the expected location in the proper mount namespace, even
> if the binary is available in all of the mount namespaces.
>
Yes, you are right. So, this solution can help only in case of very specific and simple "environment-less" programs.
So, I believe, that we should modify UMH itself to support our needs. But I don't see, how to make the idea more pleasant for the community.
IOW, when I was talking about UMH in NFS implementation on Ksummit, Linus's answer was something like "fix NFS".
And I can't object it, actually, because for now NFS is the only corner case.
Jeff said, that there are a bunch of UMH calls in kernel, but this is not solid enough to prove UHM changes, since nobody is trying to use them in containers.
So, I doubt, that we can change UMH generically without additional use-cases for 'containerized" UMH.
> Eric
>
--
Best regards,
Stanislav Kinsbursky
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists