linux-kernel - Re: [PATCH v4 04/11] net: reserve prefix

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250509-leinwand-leiht-f1031edf9c71@brauner>
Date: Fri, 9 May 2025 07:54:43 +0200
From: Christian Brauner <brauner@...nel.org>
To: Kuniyuki Iwashima <kuniyu@...zon.com>
Cc: alexander@...alicyn.com, bluca@...ian.org, daan.j.demeyer@...il.com, 
	davem@...emloft.net, david@...dahead.eu, edumazet@...gle.com, horms@...nel.org, 
	jack@...e.cz, jannh@...gle.com, kuba@...nel.org, lennart@...ttering.net, 
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org, me@...dnzj.com, 
	netdev@...r.kernel.org, oleg@...hat.com, pabeni@...hat.com, viro@...iv.linux.org.uk, 
	zbyszek@...waw.pl
Subject: Re: [PATCH v4 04/11] net: reserve prefix

On Thu, May 08, 2025 at 02:47:45PM -0700, Kuniyuki Iwashima wrote:
> From: Christian Brauner <brauner@...nel.org>
> Date: Thu, 8 May 2025 08:16:29 +0200
> > On Wed, May 07, 2025 at 03:45:52PM -0700, Kuniyuki Iwashima wrote:
> > > From: Christian Brauner <brauner@...nel.org>
> > > Date: Wed, 07 May 2025 18:13:37 +0200
> > > > Add the reserved "linuxafsk/" prefix for AF_UNIX sockets and require
> > > > CAP_NET_ADMIN in the owning user namespace of the network namespace to
> > > > bind it. This will be used in next patches to support the coredump
> > > > socket but is a generally useful concept.
> > > 
> > > I really think we shouldn't reserve address and it should be
> > > configurable by users via core_pattern as with the other
> > > coredump types.
> > > 
> > > AF_UNIX doesn't support SO_REUSEPORT, so once the socket is
> > > dying, user can't start the new coredump listener until it's
> > > fully cleaned up, which adds unnecessary drawback.
> > 
> > This really doesn't matter.
> > 
> > > The semantic should be same with other types, and the todo
> > > for the coredump service is prepare file (file, process, socket)
> > > that can receive data and set its name to core_pattern.
> > 
> > We need to perform a capability check during bind() for the host's
> > coredump socket. Otherwise if the coredump server crashes an
> > unprivileged attacker can simply bind the address and receive all
> > coredumps from suid binaries.
> 
> As I mentioned in the previous thread, this can be better
> handled by BPF LSM with more fine-grained rule.
> 
> 1. register a socket with its name to BPF map
> 2. check if the destination socket is registered at connect
> 
> Even when LSM is not availalbe, the cgroup BPF prog can make
> connect() fail if the destination name is not registered
> in the map.
> 
> > 
> > This is also a problem for legitimate coredump server updates. To change
> > the coredump address the coredump server must first setup a new socket
> > and then update core_pattern and then shutdown the old coredump socket.
> 
> So, for completeness, the server should set up a cgroup BPF
> prog to route the request for the old name to the new one.
> 
> Here, the bpf map above can be reused to check if the socket
> name is registered in the map or route to another socket in
> the map.
> 
> Then, the unprivileged issue below and the non-dumpable issue
> mentioned in the cover letter can also be resolved.
> 
> The server is expected to have CAP_SYS_ADMIN, so BPF should
> play a role.

This has been explained by multiple people over the course of this
thread already. It is simply not acceptable for basic kernel
functionality to be unsafe without the use of additional separate
subsystems. It is not ok to require bpf for a core kernel api to be
safely usable. It's irrelevant whether that's for security or cgroup
hooks. None of which we can require.

I won't even get this past Linus for that matter because he will rightly
NAK that hard and probably ask me whether I've paid any attention to
basic kernel development requirements in the last 10 years. Let alone
for coredumping which handles crashing suid binaries. I understand the
urge to outsurce this problem to userspace but that's not ok.

Coredumping is a core kernel service and all options have to be safely
usable by themselves. In fact, that goes for any kernel API and
especially VFS apis.

Using AF_UNIX sockets will be a major step forward in both simplicity
and security. We've compromised on every front so far. It's not too much
to ask for a basic permission check on a single well-known address
that's exposed as a kernel-level service.