netdev - Re: [PATCH] net: sysctl: fix edge case wrt. sysctl write access

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <8e2475cba51a078e0e12c219d81ae8f14e2196b3.camel@redhat.com>
Date: Thu, 14 Dec 2023 18:16:21 +0100
From: Paolo Abeni <pabeni@...hat.com>
To: Maciej Żenczykowski <maze@...gle.com>
Cc: Linux Network Development Mailing List <netdev@...r.kernel.org>,  "David
 S . Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
 Jakub Kicinski <kuba@...nel.org>, Flavio Crisciani <fcrisciani@...gle.com>,
 "Theodore Y. Ts'o" <tytso@...gle.com>, "Eric W. Biederman"
 <ebiederm@...ssion.com>, Dmitry Torokhov <dmitry.torokhov@...il.com>
Subject: Re: [PATCH] net: sysctl: fix edge case wrt. sysctl write access

On Thu, 2023-12-14 at 14:17 +0100, Maciej Żenczykowski wrote:
> On Thu, Dec 14, 2023 at 10:37 AM Paolo Abeni <pabeni@...hat.com> wrote:
> > 
> > On Sun, 2023-12-10 at 03:10 -0800, Maciej Żenczykowski wrote:
> > > The clear intent of net_ctl_permissions() is that having CAP_NET_ADMIN
> > > grants write access to networking sysctls.
> > > 
> > > However, it turns out there is an edge case where this is insufficient:
> > > inode_permission() has an additional check on HAS_UNMAPPED_ID(inode)
> > > which can return -EACCES and thus block *all* write access.
> > > 
> > > Note: AFAICT this check is wrt. the uid/gid mapping that was
> > > active at the time the filesystem (ie. proc) was mounted.
> > > 
> > > In order for this check to not fail, we need net_ctl_set_ownership()
> > > to set valid uid/gid.  It is not immediately clear what value
> > > to use, nor what values are guaranteed to work.
> > > It does make sense that /proc/sys/net appear to be owned by root
> > > from within the netns owning userns.  As such we only modify
> > > what happens if the code fails to map uid/gid 0.
> > > Currently the code just fails to do anything, which in practice
> > > results in using the zeroes of freshly allocated memory,
> > > and we thus end up with global root.
> > > With this change we instead use the uid/gid of the owning userns.
> > > While it is probably (?) theoretically possible for this to *also*
> > > be unmapped from the /proc filesystem's point of view, this seems
> > > much less likely to happen in practice.
> > > 
> > > The old code is observed to fail in a relatively complex setup,
> > > within a global root created user namespace with selectively
> > > mapped uid/gids (not including global root) and /proc mounted
> > > afterwards (so this /proc mount does not have global root mapped).
> > > Within this user namespace another non privileged task creates
> > > a new user namespace, maps it's own uid/gid (but not uid/gid 0),
> > > and then creates a network namespace.  It cannot write to networking
> > > sysctls even though it does have CAP_NET_ADMIN.
> > 
> > I'm wondering if this specific scenario should be considered a setup
> > issue, and should be solved with a different configuration? I would
> > love to hear others opinions!
> 
> While it could be fixed in userspace.  I don't think it should:
> 
> The global root uid/gid are very intentionally not mapped in (as a
> security feature).
> So that part isn't changeable (it's also a system daemon and not under
> user control).
> 
> The user namespace very intentionally maps uid->uid and not 0->uid.
> Here there's theoretically more leeway... because it is at least under
> user control.
> However here this is done for good reason as well.
> There's plenty of code that special cases uid=0, both in the kernel
> (for example capability handling across exec) and in various userspace
> libraries.  It's unrealistic to fix them all.
> Additionally it's nice to have semi-transparent user namespaces,
> which are security barriers but don't remap uids - remapping causes confusion.
> (ie. the uid is either mapped or not, but if it is mapped it's a 1:1 mapping)
> 
> As for why?  Because uids as visible to userspace may leak across user
> namespace boundaries,
> either when talking to other system daemons or when talking across machines.
> It's pretty easy (and common) to have uids that are globally unique
> and meaningful in a cluster of machines.
> Again, this is *theoretically* fixable in userspace, but not actually
> a realistic expectation.
> 
> btw. even outside of clusters of machines, I also run some
> user/uts/net namespace using
> code on my personal desktop (this does require some minor hacks to
> unshare/mount binaries),
> and again I intentionally map uid->uid and 0->uid, because this makes
> my username show up as 'maze' and not 'root'.

I see, thanks for all the details.

> This is *clearly* a kernel bug that this doesn't just work.
> (side note: there's a very similar issue in proc_net.c which I haven't
> gotten around to fixing yet, because it looks to be more complex to
> convince oneself it's safe to do)

Indeed the potential security related issue is the root cause of my
concerns here. I could not identify any such problem, but I must admit
the uid mapping is not the kernel I know better.

I definitely could use another pair of eyeballs here ;)

Cheers,

Paolo