linux-kernel - Re: call_usermodehelper in containers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20131112083043.0ab78e67@tlielax.poochiereds.net>
Date:	Tue, 12 Nov 2013 08:30:43 -0500
From:	Jeff Layton <jlayton@...hat.com>
To:	Stanislav Kinsbursky <skinsbursky@...allels.com>
Cc:	Greg KH <gregkh@...uxfoundation.org>,
	<linux-kernel@...r.kernel.org>, <linux-fsdevel@...r.kernel.org>,
	<linux-nfs@...r.kernel.org>, <devel@...nvz.org>,
	<ebiederm@...ssion.com>, <oleg@...hat.com>, <bfields@...ldses.org>,
	<bharrosh@...asas.com>
Subject: Re: call_usermodehelper in containers

On Tue, 12 Nov 2013 17:02:36 +0400
Stanislav Kinsbursky <skinsbursky@...allels.com> wrote:

> 12.11.2013 15:12, Jeff Layton пишет:
> > On Mon, 11 Nov 2013 16:47:03 -0800
> > Greg KH <gregkh@...uxfoundation.org> wrote:
> >
> >> On Mon, Nov 11, 2013 at 07:18:25AM -0500, Jeff Layton wrote:
> >>> We have a bit of a problem wrt to upcalls that use call_usermodehelper
> >>> with containers and I'd like to bring this to some sort of resolution...
> >>>
> >>> A particularly problematic case (though there are others) is the
> >>> nfsdcltrack upcall. It basically uses call_usermodehelper to run a
> >>> program in userland to track some information on stable storage for
> >>> nfsd.
> >>
> >> I thought the discussion at the kernel summit about this issue was:
> >> 	- don't do this.
> >> 	- don't do it.
> >> 	- if you really need to do this, fix nfsd
> >>
> >
> > Sorry, I couldn't make the kernel summit so I missed that discussion. I
> > guess LWN didn't cover it?
> >
> > In any case, I guess then that we'll either have to come up with some
> > way to fix nfsd here, or simply ensure that nfsd can never be started
> > unless root in the container has a full set of a full set of
> > capabilities.
> >
> > One sort of Rube Goldberg possibility to fix nfsd is:
> >
> > - when we start nfsd in a container, fork off an extra kernel thread
> >    that just sits idle. That thread would need to be a descendant of the
> >    userland process that started nfsd, so we'd need to create it with
> >    kernel_thread().
> >
> > - Have the kernel just start up the UMH program in the init_ns mount
> >    namespace as it currently does, but also pass the pid of the idle
> >    kernel thread to the UMH upcall.
> >
> > - The program will then use /proc/<pid>/root and /proc/<pid>/ns/* to set
> >    itself up for doing things properly.
> >
> > Note that with this mechanism we can't actually run a different binary
> > per container, but that's probably fine for most purposes.
> >
> 
> Hmmm... Why we can't? We can go a bit further with userspace idea.
> 
> We use UMH some very limited number of user programs. For 2, actually:
> 1) /sbin/nfs_cache_getent
> 2) /sbin/nfsdcltrack
> 

No, the kernel uses them for a lot more than that. Pretty much all of
the keys API upcalls use it. See all of the callers of
call_usermodehelper. All of them are running user binaries out of the
kernel, and almost all of them are certainly broken wrt containers.

> If we convert them into proxies, which use /proc/<pid>/root and /proc/<pid>/ns/*, this will allow us to lookup the right binary.
> The only limitation here is presence of this "proxy" binaries on "host".
> 

Suppose I spawn my own container as a user, using all of this spiffy
new user namespace stuff. Then I make the kernel use
call_usermodehelper to call the upcall in the init_ns, and then trick
it into running my new "escape_from_namespace" program with "real" root
privileges.

I don't think we can reasonably assume that having the kernel exec an
arbitrary binary inside of a container is safe. Doing so inside of the
init_ns is marginally more safe, but only marginally so...

> And we don't need any significant changes in kernel.
> 
> BTW, Jeff, could you remind me, please, why exactly we need to use UMH to run the binary?
> What are this capabilities, which force us to do so?
> 

Nothing _forces_ us to do so, but upcalls are very difficult to handle,
and UMH has a lot of advantages over a long-running daemon launched by
userland.

Originally, I created the nfsdcltrack upcall as a running daemon called
nfsdcld, and the kernel used rpc_pipefs to communicate with it.

Everyone hated it because no one likes to have to run daemons for
infrequently used upcalls. It's a pain for users to ensure that it's
running and it's a pain to handle when it isn't. So, I was encouraged
to turn that instead into a UMH upcall.

But leaving that aside, this problem is a lot larger than just nfsd. We
have a *lot* of UMH upcalls in the kernel, so this problem is more
general than just "fixing" nfsd's.

-- 
Jeff Layton <jlayton@...hat.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/