[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMw=ZnRC7Okmew=rrEocFuFn8hhrcergHciPjxFPuG4c6qH_Bw@mail.gmail.com>
Date: Tue, 13 May 2025 02:09:24 +0100
From: Luca Boccassi <bluca@...ian.org>
To: Kuniyuki Iwashima <kuniyu@...zon.com>
Cc: alexander@...alicyn.com, brauner@...nel.org, daan.j.demeyer@...il.com,
daniel@...earbox.net, davem@...emloft.net, david@...dahead.eu,
edumazet@...gle.com, horms@...nel.org, jack@...e.cz, jannh@...gle.com,
kuba@...nel.org, lennart@...ttering.net, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-security-module@...r.kernel.org,
me@...dnzj.com, netdev@...r.kernel.org, oleg@...hat.com, pabeni@...hat.com,
viro@...iv.linux.org.uk, zbyszek@...waw.pl
Subject: Re: [PATCH v6 4/9] coredump: add coredump socket
On Tue, 13 May 2025 at 01:18, Kuniyuki Iwashima <kuniyu@...zon.com> wrote:
>
> From: Luca Boccassi <bluca@...ian.org>
> Date: Mon, 12 May 2025 11:58:54 +0100
> > On Mon, 12 May 2025 at 09:56, Christian Brauner <brauner@...nel.org> wrote:
> > >
> > > Coredumping currently supports two modes:
> > >
> > > (1) Dumping directly into a file somewhere on the filesystem.
> > > (2) Dumping into a pipe connected to a usermode helper process
> > > spawned as a child of the system_unbound_wq or kthreadd.
> > >
> > > For simplicity I'm mostly ignoring (1). There's probably still some
> > > users of (1) out there but processing coredumps in this way can be
> > > considered adventurous especially in the face of set*id binaries.
> > >
> > > The most common option should be (2) by now. It works by allowing
> > > userspace to put a string into /proc/sys/kernel/core_pattern like:
> > >
> > > |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h
> > >
> > > The "|" at the beginning indicates to the kernel that a pipe must be
> > > used. The path following the pipe indicator is a path to a binary that
> > > will be spawned as a usermode helper process. Any additional parameters
> > > pass information about the task that is generating the coredump to the
> > > binary that processes the coredump.
> > >
> > > In the example core_pattern shown above systemd-coredump is spawned as a
> > > usermode helper. There's various conceptual consequences of this
> > > (non-exhaustive list):
> > >
> > > - systemd-coredump is spawned with file descriptor number 0 (stdin)
> > > connected to the read-end of the pipe. All other file descriptors are
> > > closed. That specifically includes 1 (stdout) and 2 (stderr). This has
> > > already caused bugs because userspace assumed that this cannot happen
> > > (Whether or not this is a sane assumption is irrelevant.).
> > >
> > > - systemd-coredump will be spawned as a child of system_unbound_wq. So
> > > it is not a child of any userspace process and specifically not a
> > > child of PID 1. It cannot be waited upon and is in a weird hybrid
> > > upcall which are difficult for userspace to control correctly.
> > >
> > > - systemd-coredump is spawned with full kernel privileges. This
> > > necessitates all kinds of weird privilege dropping excercises in
> > > userspace to make this safe.
> > >
> > > - A new usermode helper has to be spawned for each crashing process.
> > >
> > > This series adds a new mode:
> > >
> > > (3) Dumping into an abstract AF_UNIX socket.
> > >
> > > Userspace can set /proc/sys/kernel/core_pattern to:
> > >
> > > @address SO_COOKIE
> > >
> > > The "@" at the beginning indicates to the kernel that the abstract
> > > AF_UNIX coredump socket will be used to process coredumps. The address
> > > is given by @address and must be followed by the socket cookie of the
> > > coredump listening socket.
> > >
> > > The socket cookie is used to verify the socket connection. If the
> > > coredump server restarts or crashes and someone recycles the socket
> > > address the kernel will detect that the address has been recycled as the
> > > socket cookie will have necessarily changed and refuse to connect.
> >
> > This dynamic/cookie prefix makes it impossible to use this with socket
> > activation units. The way systemd-coredump works is that every
> > instance is an independent templated unit, spawned when there's a
> > connection to the private socket. If the path was fixed, we could just
> > reuse the same mechanism, it would fit very nicely with minimal
> > changes.
>
> Note this version does not use prefix. Now it requires users to
> just pass the socket cookie via core_pattern so that the kernel
> can verify the peer.
Exactly - this means the pattern cannot be static in a sysctl.d early
on boot anymore, and has to be set dynamically by <something>. This is
a severe degradation over the status quo.
> > But because you need a "server" to be permanently running, this means
> > socket-based activation can no longer work, and systemd-coredump must
> > switch to a persistently-running mode.
>
> The only thing for systemd to do is assign a cookie after socket creation.
>
> As long as systemd hold the file descriptor of the socket, you don't need
> a dedicated "server" running permanently, and the fd can be passed around
> to a spawned/activated process.
There is no such facility, a socket is just a socket and there's no
infrastructure to randomly extract random information from one and
write it to some other random file in procfs, and I don't see why we
should add some super-special-case just for this, it sounds really
messy.
Also sockets can be and in fact are routinely restarted (eg: on
package upgrades), which would invalidate this whole scheme, and
result in a very racy setup. When packages are upgraded it's one of
the most complex workflows in modern distros, and it's very likely
that things start crashing exactly at that point, and with this
workflow it would mean we'll lose core files due to the race between
restarting the socket unit and <something> updating the pattern
accordingly.
Also we very much want to be able to spawn as many core handlers at
the same time as needed, which I don't see how can work with a cookie
that has to be unique per socket.
Sorry, but this particular approach seems completely unnecessary and
over-complicated, and doesn't seem to fit very well with how modern
userspace is set up today, and I don't see what actual problem it
would solve? If you need it for some particular use case that's
absolutely fine, but please add it later as another optional mode, so
that we don't have to degrade our use cases for it. That way everyone
gets what they want, and everyone's happy.
v5 was super nice and had everything we needed to massively improve
the status quo, with easy and straightforward changes and no real
drawbacks, so it would be really great if we could just go back to
that version, please. Thanks.
Powered by blists - more mailing lists