linux-kernel - Re: [PATCH v5 2/4] fuse: Support fuse filesystems outside of init_user

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20141112162254.GB31775@ubuntu-hedt>
Date:	Wed, 12 Nov 2014 10:22:54 -0600
From:	Seth Forshee <seth.forshee@...onical.com>
To:	Miklos Szeredi <miklos@...redi.hu>
Cc:	"Eric W. Biederman" <ebiederm@...ssion.com>,
	"Serge H. Hallyn" <serge.hallyn@...ntu.com>,
	Andy Lutomirski <luto@...capital.net>,
	Michael j Theall <mtheall@...ibm.com>,
	fuse-devel@...ts.sourceforge.net, linux-kernel@...r.kernel.org,
	linux-fsdevel@...r.kernel.org, seth.forshee@...onical.com
Subject: Re: [PATCH v5 2/4] fuse: Support fuse filesystems outside of
 init_user_ns

On Wed, Nov 12, 2014 at 02:09:15PM +0100, Miklos Szeredi wrote:
> On Tue, Nov 11, 2014 at 09:37:10AM -0600, Eric W. Biederman wrote:
> 
> > > Maybe I'm being dense, but can someone give a concrete example of such an
> > > attack?
> > 
> > There are two variants of things at play here.
> > 
> > There is the classic if you don't freeze your context at open time when
> > you pass that file descriptor to another process unexpected things can
> > happen.  
> > 
> > An essentially harmless but extremely confusing example is what happens
> > to a partial read when it stops halfway through a uid value and the next
> > read on the same file descriptor is from a process in a different user
> > namespace.  Which uid value should be returned to userspace.
> 
> Fuse device doesn't currently do partial reads, so that's a non-issue.
> 
> > Now if I am in a nefarious mood I can create a unprivileged user
> > namespace, open /dev/fuse and mount a fuse filesystem.  Pass the file
> > descriptor to /dev/fuse to a processes that is in the default user
> > namespace (and thus can use any uid/gid).   With that file desctipor
> > report that there is a setuid 0 exectuable on that file system.
> 
> Yes, and this would also be prevented by MNT_NOSUID, which would be a good idea
> anyway.  I just don't see the reason we'd want to allow clearing MNT_NOSUID in a
> private namespace.
> 
> So we don't currently see a use case for relaxing either the MNT_NOSUID
> restriction or for relaxing the requirement on the user namespace the fuse
> server is in.  Is that correct?
> 
> If so, we should leave both restrictions in place since that allows the greatest
> flexibility in the future, is either of those needs to be relaxed.

I'm not aware of specific use cases for either at this point. However,
Andy's patch [1] will limit suid to the set of namespaces where the user
who mounted the filesystem already has privileges. Enforcing MNT_NOSUID
will require enforcement in the vfs, and in that case we definitely need
to decide whether the policy is to implicitly add the flag or fail the
mount attempt if the flag is not present [2].

> > > That might also help me understand how exactly user/pid namespaces work...
> > 
> > The idea of user/pid namespaces is to translate uid, gids and pids at
> > the edge of userspace into a kernel internal form that can be use
> > everywhere.  In this case we get into the subtlties of which
> > translations make sense.
> 
> I mean, what's the point of translating uid, gids and pids?  What are the use
> cases?

Do you mean in general, or for fuse specifically? In general user/pid
namespaces are primarily used to implement containers with isolated sets
of resources (if you're unfamiliar with containers, think of something
which looks more or less like a VM from within but runs under the same
kernel as the host).

For fuse: an unprivileged user has a regular file containing a
filesystem image which they wish to mount inside a container using fuse.
Assume that in this container uid 0 maps to uid 100000 in the host, etc.
The filesystem image is likely to be using ids like 0, 1000, etc. If the
kernel translates these to kuid 0, 1000, ... then these will map to
overflowuid in the container, and the mount won't be very useful to the
user. What the user expects is that uid 0 in the filesystem will map to
uid 0 within the container (kuid 100000 in this example).

The pids aren't nearly so user-visible, but if the userspace fuse driver
is running in a pid namespace then pids must be translated into the
namespace to be useful to the driver.

Does that answer your questions?

> What are the rules on the translations between parent and child namespaces?
> 
> Is all this documented anywhere?

I haven't found any documentation. Eric?

As far as I can tell though the most important rules are to translate
to/from the kernel's internal representation as close to the
userspace/kernel boundary as possible and to work with kernel-internal
representations within the kernel (e.g. kuid_t, kgid_t, etc.).

The series of articles starting with [3] also serve as a good
introduction.

Thanks,
Seth

[1] http://lkml.kernel.org/g/252a4d87d99fc2b5fe4411c838f65b312c4e13cd.1413330857.git.luto@amacapital.net
[2] http://lkml.kernel.org/g/2686c32f00b14148379e8cfee9c028c794d4aa1a.1407974494.git.luto@amacapital.net
[3] http://lwn.net/Articles/531114/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/