linux-kernel - Re: fsconfig parsing bugs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220901145028.3lndphzrylsyqx5o@wittgenstein>
Date:   Thu, 1 Sep 2022 16:50:28 +0200
From:   Christian Brauner <brauner@...nel.org>
To:     Seth Jenkins <sethjenkins@...gle.com>
Cc:     viro@...iv.linux.org.uk, linux-fsdevel@...r.kernel.org,
        linux-kernel@...r.kernel.org, dhowells@...hat.com,
        Jann Horn <jannh@...gle.com>,
        Natalie Silvanovich <natashenka@...gle.com>
Subject: Re: fsconfig parsing bugs

On Wed, Aug 31, 2022 at 04:12:21PM -0700, Seth Jenkins wrote:
> The codebase-wide refactor efforts to using the latest fs mounting API
> (with support for fsopen/fsconfig/fsmount etc.) have introduced some
> bugs into mount configuration parsing in several parse_param handlers,
> most notably shmem_parse_one() which can be accessed from a userns.
> There are several cases where the following code pattern is used:
> 
> ctx->value = <expression>
> if(ctx->value is invalid)
>    goto fail;
> ctx->seen |= SHMEM_SEEN_X;
> break;
> 
> However, this coding pattern does not work in the case where multiple
> fsconfig calls are made. For example, if I were to call fsconfig with
> the key "nr_blocks" twice, the first time with a valid value, and the
> second time with an invalid value, the invalid value will be persisted
> and used upon creation of the mount for the value of ctx->blocks, and
> consequently for sbinfo->max_blocks.
> 
> This code pattern is used for Opt_nr_blocks, Opt_nr_inodes, Opt_uid,
> Opt_gid and Opt_huge. Probably the proper thing to do is to check for
> validity before assigning the value to the shmem_options struct in the
> fs_context.
> 
> We also see this code pattern replicated throughout other filesystems
> for uid/gid resolution, including hugetlbfs, FUSE, ntfs3 and ffs.
> 
> The other outstanding issue I noticed comes from the fact that
> fsconfig syscalls may occur in a different userns than that which
> called fsopen. That means that resolving the uid/gid via
> current_user_ns() can save a kuid that isn't mapped in the associated
> namespace when the filesystem is finally mounted. This means that it
> is possible for an unprivileged user to create files owned by any
> group in a tmpfs mount (since we can set the SUID bit on the tmpfs
> directory), or a tmpfs that is owned by any user, including the root
> group/user. This is probably outside the original intention of this
> code.
> 
> The fix for this bug is not quite so simple as the others. The options
> that I've assessed are:
> 
> - Resolve the kuid/kgid via the fs_context namespace - this does
> however mean that any task outside the fsopen'ing userns that tries to
> set the uid/gid of a tmpfs will have to know that the uid/gid will be
> resolved by a different namespace than that which the current task is
> in. It also subtly changes the behavior of this specific subsystem in
> a userland visible way.
> - Globally disallow fsconfig calls originating from outside the
> fs_context userns - This is a more robust solution that would prevent
> any similar bugs, but it may impinge on valid mount use-cases. It's
> the best from a security standpoint and if it's determined that it was
> not in the original intention to be juggling user/mount namespaces
> this way, it's probably the ideal solution.
> - Throw EINVAL if the kuid specified cannot be mapped in the mounting
> userns (and/or potentially in the fs_context userns) - This is
> probably the solution that remains most faithful to all potential
> use-cases, but it doesn't reduce the potential for variants in the
> future in other parts of the codebase and it also introduces some
> slight derivative logic bug risk.
> - Don't resolve the uid/gid specified in fsconfig at all, and resolve
> it during mount-time when calling an associated fill_super. This is
> precedented and used in other parts of the codebase, but specificity
> is lost in the final error case since an end-user cannot easily
> attribute a mount failure to an unmappable uid.
> 
> I've also attached a PoC for this bug that demonstrates that an
> unprivileged user can create files/directories with root uid/gid's.
> There is no deadline for this issue as we can't see any obvious way to
> cross a privilege boundary with this.
> 
> Thanks in advance!

I'm involved in 2 large projects that make use of the new mount api LXC
and CRIU. None of them call fsconfig() outside of the target user
namespace. util-linux mount(2) does not yet use the new mount api and so
can't be affected either but will in maybe even the next release.
Additionally, glibc 2.36 is the first glibc with support for the new
mount api which just released. So all users before that users would have
to write their own system call wrappers so I think we have some liberty
here.

I think this is too much of a restriction to require that fsopen() and
fsconfig() userns must match in order to set options. It is pretty handy
to be able to set mount options outside of fc->user_ns. And we'd
definitely want to make use of this in the future.

So ideally, we just switch all filesystems that are mountable in userns
over to use fc->user_ns. There's not really a big regression risk here
because it's not used in userns workloads widely today. Taking a close
look, the affected filesystems are devpts and tmpfs. Having them rely on
fc->user_ns aligns them with how fuse does it today.