linux-kernel - Re: [PATCH v6 4/9] coredump: add coredump socket

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMw=ZnTF9EVV+E+bXTz1je3VT+OwDPAzbbFy7G02zBjeCpqxFA@mail.gmail.com>
Date: Mon, 12 May 2025 11:58:54 +0100
From: Luca Boccassi <bluca@...ian.org>
To: Christian Brauner <brauner@...nel.org>
Cc: linux-fsdevel@...r.kernel.org, Jann Horn <jannh@...gle.com>, 
	Daniel Borkmann <daniel@...earbox.net>, Kuniyuki Iwashima <kuniyu@...zon.com>, 
	Eric Dumazet <edumazet@...gle.com>, Oleg Nesterov <oleg@...hat.com>, 
	"David S. Miller" <davem@...emloft.net>, Alexander Viro <viro@...iv.linux.org.uk>, 
	Daan De Meyer <daan.j.demeyer@...il.com>, David Rheinsberg <david@...dahead.eu>, 
	Jakub Kicinski <kuba@...nel.org>, Jan Kara <jack@...e.cz>, 
	Lennart Poettering <lennart@...ttering.net>, Mike Yuan <me@...dnzj.com>, Paolo Abeni <pabeni@...hat.com>, 
	Simon Horman <horms@...nel.org>, Zbigniew Jędrzejewski-Szmek <zbyszek@...waw.pl>, 
	linux-kernel@...r.kernel.org, netdev@...r.kernel.org, 
	linux-security-module@...r.kernel.org, 
	Alexander Mikhalitsyn <alexander@...alicyn.com>
Subject: Re: [PATCH v6 4/9] coredump: add coredump socket

On Mon, 12 May 2025 at 09:56, Christian Brauner <brauner@...nel.org> wrote:
>
> Coredumping currently supports two modes:
>
> (1) Dumping directly into a file somewhere on the filesystem.
> (2) Dumping into a pipe connected to a usermode helper process
>     spawned as a child of the system_unbound_wq or kthreadd.
>
> For simplicity I'm mostly ignoring (1). There's probably still some
> users of (1) out there but processing coredumps in this way can be
> considered adventurous especially in the face of set*id binaries.
>
> The most common option should be (2) by now. It works by allowing
> userspace to put a string into /proc/sys/kernel/core_pattern like:
>
>         |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h
>
> The "|" at the beginning indicates to the kernel that a pipe must be
> used. The path following the pipe indicator is a path to a binary that
> will be spawned as a usermode helper process. Any additional parameters
> pass information about the task that is generating the coredump to the
> binary that processes the coredump.
>
> In the example core_pattern shown above systemd-coredump is spawned as a
> usermode helper. There's various conceptual consequences of this
> (non-exhaustive list):
>
> - systemd-coredump is spawned with file descriptor number 0 (stdin)
>   connected to the read-end of the pipe. All other file descriptors are
>   closed. That specifically includes 1 (stdout) and 2 (stderr). This has
>   already caused bugs because userspace assumed that this cannot happen
>   (Whether or not this is a sane assumption is irrelevant.).
>
> - systemd-coredump will be spawned as a child of system_unbound_wq. So
>   it is not a child of any userspace process and specifically not a
>   child of PID 1. It cannot be waited upon and is in a weird hybrid
>   upcall which are difficult for userspace to control correctly.
>
> - systemd-coredump is spawned with full kernel privileges. This
>   necessitates all kinds of weird privilege dropping excercises in
>   userspace to make this safe.
>
> - A new usermode helper has to be spawned for each crashing process.
>
> This series adds a new mode:
>
> (3) Dumping into an abstract AF_UNIX socket.
>
> Userspace can set /proc/sys/kernel/core_pattern to:
>
>         @address SO_COOKIE
>
> The "@" at the beginning indicates to the kernel that the abstract
> AF_UNIX coredump socket will be used to process coredumps. The address
> is given by @address and must be followed by the socket cookie of the
> coredump listening socket.
>
> The socket cookie is used to verify the socket connection. If the
> coredump server restarts or crashes and someone recycles the socket
> address the kernel will detect that the address has been recycled as the
> socket cookie will have necessarily changed and refuse to connect.

This dynamic/cookie prefix makes it impossible to use this with socket
activation units. The way systemd-coredump works is that every
instance is an independent templated unit, spawned when there's a
connection to the private socket. If the path was fixed, we could just
reuse the same mechanism, it would fit very nicely with minimal
changes.

But because you need a "server" to be permanently running, this means
socket-based activation can no longer work, and systemd-coredump must
switch to a persistently-running mode. This is a severe degradation of
functionality, will continuously waste CPU/memory resources for no
good reasons, and makes the whole thing more fragile and complex, as
if there are any issues with this server, you start losing core files.
And honestly I don't really see the point? Setting the pattern is a
privileged operation anyway. systemd manages the socket with a socket
unit and again that's privileged already.

Could we drop this cookie prefix and go back to the previous version
(v5), please? Or if there is some specific non-systemd use case in
mind that I am not aware of, have both options, so that we can use the
simpler and more straightforward one with systemd-coredump.
Thanks!