[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <87h7mxotww.fsf@x220.int.ebiederm.org>
Date: Sun, 31 Jan 2021 12:14:39 -0600
From: ebiederm@...ssion.com (Eric W. Biederman)
To: "Serge E. Hallyn" <serge@...lyn.com>
Cc: Miklos Szeredi <mszeredi@...hat.com>,
linux-fsdevel@...r.kernel.org, linux-unionfs@...r.kernel.org,
linux-security-module@...r.kernel.org,
linux-kernel@...r.kernel.org,
Christian Brauner <christian.brauner@...ntu.com>
Subject: Re: [PATCH 2/2] security.capability: fix conversions on getxattr
"Serge E. Hallyn" <serge@...lyn.com> writes:
> On Fri, Jan 29, 2021 at 04:55:29PM -0600, Eric W. Biederman wrote:
>> "Serge E. Hallyn" <serge@...lyn.com> writes:
>>
>> > On Thu, Jan 28, 2021 at 02:19:13PM -0600, Eric W. Biederman wrote:
>> >> "Serge E. Hallyn" <serge@...lyn.com> writes:
>> >>
>> >> > On Tue, Jan 19, 2021 at 07:34:49PM -0600, Eric W. Biederman wrote:
>> >> >> Miklos Szeredi <mszeredi@...hat.com> writes:
>> >> >>
>> >> >> > If a capability is stored on disk in v2 format cap_inode_getsecurity() will
>> >> >> > currently return in v2 format unconditionally.
>> >> >> >
>> >> >> > This is wrong: v2 cap should be equivalent to a v3 cap with zero rootid,
>> >> >> > and so the same conversions performed on it.
>> >> >> >
>> >> >> > If the rootid cannot be mapped v3 is returned unconverted. Fix this so
>> >> >> > that both v2 and v3 return -EOVERFLOW if the rootid (or the owner of the fs
>> >> >> > user namespace in case of v2) cannot be mapped in the current user
>> >> >> > namespace.
>> >> >>
>> >> >> This looks like a good cleanup.
>> >> >
>> >> > Sorry, I'm not following. Why is this a good cleanup? Why should
>> >> > the xattr be shown as faked v3 in this case?
>> >>
>> >> If the reader is in &init_user_ns. If the filesystem was mounted in a
>> >> user namespace. Then the reader looses the information that the
>> >
>> > Can you be more precise about "filesystem was mounted in a user namespace"?
>> > Is this a FUSE thing, the fs is marked as being mounted in a non-init userns?
>> > If that's a possible case, then yes that must be represented as v3. Using
>> > is_v2header() may be the simpler way to check for that, but the more accurate
>> > check would be "is it v2 header and mounted by init_user_ns".
>>
>> I think the filesystems current relevant are fuse,overlayfs,ramfs,tmpfs.
>>
>> > Basically yes, in as many cases as possible we want to just give a v2
>> > cap because more userspace knows what to do with that, but a non-init-userns
>> > mounted fs which provides a v2 fscap should have it represented as v3 cap
>> > with rootid being the kuid that owns the userns.
>>
>> That is the case we that is being fixed in the patch.
>>
>> > Or am I still thinking wrongly? Wouldn't be entirely surprised :)
>>
>> No you got it.
>
> So then can we make faking a v3 gated on whether
> sb->s_user_ns != &init_user_ns ?
Sort of.
What Miklos's patch implements is always treating a v2 cap xattr on disk
as v3 internally.
> if (is_v2header((size_t) ret, cap)) {
> root = 0;
> } else if (is_v3header((size_t) ret, cap)) {
> nscap = (struct vfs_ns_cap_data *) tmpbuf;
> root = le32_to_cpu(nscap->rootid);
> } else {
> size = -EINVAL;
> goto out_free;
> }
Then v3 is returned if:
> /* If the root kuid maps to a valid uid in current ns, then return
> * this as a nscap. */
> mappedroot = from_kuid(current_user_ns(), kroot);
> if (mappedroot != (uid_t)-1 && mappedroot != (uid_t)0) {
After that we verify that the fs capability can be seen by the caller
as a v2 cap xattr with:
> > if (!rootid_owns_currentns(kroot)) {
> > size = -EOVERFLOW;
> > goto out_free;
Anything that passes that test and does not encounter a memory
allocation error is returned as a v2.
...
Which in practice does mean that if sb->s_user_ns != &init_user_ns,
then mappedroot != 0, and is returned as a v3.
The rest of the logic takes care of all of the other crazy silly
combinations. Like a user namespace that identity maps uid 0,
and then mounts a filesystem.
Eric
Powered by blists - more mailing lists