linux-kernel - Re: [PATCH v6 4/9] coredump: add coredump socket

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250513000654.70344-1-kuniyu@amazon.com>
Date: Mon, 12 May 2025 17:06:50 -0700
From: Kuniyuki Iwashima <kuniyu@...zon.com>
To: <brauner@...nel.org>
CC: <alexander@...alicyn.com>, <bluca@...ian.org>, <daan.j.demeyer@...il.com>,
	<daniel@...earbox.net>, <davem@...emloft.net>, <david@...dahead.eu>,
	<edumazet@...gle.com>, <horms@...nel.org>, <jack@...e.cz>,
	<jannh@...gle.com>, <kuba@...nel.org>, <kuniyu@...zon.com>,
	<lennart@...ttering.net>, <linux-fsdevel@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, <linux-security-module@...r.kernel.org>,
	<me@...dnzj.com>, <netdev@...r.kernel.org>, <oleg@...hat.com>,
	<pabeni@...hat.com>, <viro@...iv.linux.org.uk>, <zbyszek@...waw.pl>
Subject: Re: [PATCH v6 4/9] coredump: add coredump socket

From: Christian Brauner <brauner@...nel.org>
Date: Mon, 12 May 2025 10:55:23 +0200
> Coredumping currently supports two modes:
> 
> (1) Dumping directly into a file somewhere on the filesystem.
> (2) Dumping into a pipe connected to a usermode helper process
>     spawned as a child of the system_unbound_wq or kthreadd.
> 
> For simplicity I'm mostly ignoring (1). There's probably still some
> users of (1) out there but processing coredumps in this way can be
> considered adventurous especially in the face of set*id binaries.
> 
> The most common option should be (2) by now. It works by allowing
> userspace to put a string into /proc/sys/kernel/core_pattern like:
> 
>         |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h
> 
> The "|" at the beginning indicates to the kernel that a pipe must be
> used. The path following the pipe indicator is a path to a binary that
> will be spawned as a usermode helper process. Any additional parameters
> pass information about the task that is generating the coredump to the
> binary that processes the coredump.
> 
> In the example core_pattern shown above systemd-coredump is spawned as a
> usermode helper. There's various conceptual consequences of this
> (non-exhaustive list):
> 
> - systemd-coredump is spawned with file descriptor number 0 (stdin)
>   connected to the read-end of the pipe. All other file descriptors are
>   closed. That specifically includes 1 (stdout) and 2 (stderr). This has
>   already caused bugs because userspace assumed that this cannot happen
>   (Whether or not this is a sane assumption is irrelevant.).
> 
> - systemd-coredump will be spawned as a child of system_unbound_wq. So
>   it is not a child of any userspace process and specifically not a
>   child of PID 1. It cannot be waited upon and is in a weird hybrid
>   upcall which are difficult for userspace to control correctly.
> 
> - systemd-coredump is spawned with full kernel privileges. This
>   necessitates all kinds of weird privilege dropping excercises in
>   userspace to make this safe.
> 
> - A new usermode helper has to be spawned for each crashing process.
> 
> This series adds a new mode:
> 
> (3) Dumping into an abstract AF_UNIX socket.
> 
> Userspace can set /proc/sys/kernel/core_pattern to:
> 
>         @address SO_COOKIE
> 
> The "@" at the beginning indicates to the kernel that the abstract
> AF_UNIX coredump socket will be used to process coredumps. The address
> is given by @address and must be followed by the socket cookie of the
> coredump listening socket.
> 
> The socket cookie is used to verify the socket connection. If the
> coredump server restarts or crashes and someone recycles the socket
> address the kernel will detect that the address has been recycled as the
> socket cookie will have necessarily changed and refuse to connect.
> 
> The coredump socket is located in the initial network namespace. When a
> task coredumps it opens a client socket in the initial network namespace
> and connects to the coredump socket.
> 
> - The coredump server uses SO_PEERPIDFD to get a stable handle on the
>   connected crashing task. The retrieved pidfd will provide a stable
>   reference even if the crashing task gets SIGKILLed while generating
>   the coredump.
> 
> - By setting core_pipe_limit non-zero userspace can guarantee that the
>   crashing task cannot be reaped behind it's back and thus process all
>   necessary information in /proc/<pid>. The SO_PEERPIDFD can be used to
>   detect whether /proc/<pid> still refers to the same process.
> 
>   The core_pipe_limit isn't used to rate-limit connections to the
>   socket. This can simply be done via AF_UNIX sockets directly.
> 
> - The pidfd for the crashing task will grow new information how the task
>   coredumps.
> 
> - The coredump server should mark itself as non-dumpable.
> 
> - A container coredump server in a separate network namespace can simply
>   bind to another well-know address and systemd-coredump fowards
>   coredumps to the container.
> 
> - Coredumps could in the future also be handled via per-user/session
>   coredump servers that run only with that users privileges.
> 
>   The coredump server listens on the coredump socket and accepts a
>   new coredump connection. It then retrieves SO_PEERPIDFD for the
>   client, inspects uid/gid and hands the accepted client to the users
>   own coredump handler which runs with the users privileges only
>   (It must of coure pay close attention to not forward crashing suid
>   binaries.).
> 
> The new coredump socket will allow userspace to not have to rely on
> usermode helpers for processing coredumps and provides a safer way to
> handle them instead of relying on super privileged coredumping helpers
> that have and continue to cause significant CVEs.
> 
> This will also be significantly more lightweight since no fork()+exec()
> for the usermodehelper is required for each crashing process. The
> coredump server in userspace can e.g., just keep a worker pool.
> 
> Signed-off-by: Christian Brauner <brauner@...nel.org>

Reviewed-by: Kuniyuki Iwashima <kuniyu@...zon.com>

Thanks!