lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMw=ZnQJ3h3ho2Q2L07AOuUaOr05HNm1aMoSyLTS_OD+w9BSPQ@mail.gmail.com>
Date: Wed, 14 May 2025 23:38:45 +0100
From: Luca Boccassi <bluca@...ian.org>
To: Christian Brauner <brauner@...nel.org>
Cc: linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org, 
	netdev@...r.kernel.org, linux-security-module@...r.kernel.org
Subject: Re: [PATCH v7 0/9] coredump: add coredump socket

On Wed, 14 May 2025 at 23:04, Christian Brauner <brauner@...nel.org> wrote:
>
> Coredumping currently supports two modes:
>
> (1) Dumping directly into a file somewhere on the filesystem.
> (2) Dumping into a pipe connected to a usermode helper process
>     spawned as a child of the system_unbound_wq or kthreadd.
>
> For simplicity I'm mostly ignoring (1). There's probably still some
> users of (1) out there but processing coredumps in this way can be
> considered adventurous especially in the face of set*id binaries.
>
> The most common option should be (2) by now. It works by allowing
> userspace to put a string into /proc/sys/kernel/core_pattern like:
>
>         |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h
>
> The "|" at the beginning indicates to the kernel that a pipe must be
> used. The path following the pipe indicator is a path to a binary that
> will be spawned as a usermode helper process. Any additional parameters
> pass information about the task that is generating the coredump to the
> binary that processes the coredump.
>
> In the example core_pattern shown above systemd-coredump is spawned as a
> usermode helper. There's various conceptual consequences of this
> (non-exhaustive list):
>
> - systemd-coredump is spawned with file descriptor number 0 (stdin)
>   connected to the read-end of the pipe. All other file descriptors are
>   closed. That specifically includes 1 (stdout) and 2 (stderr). This has
>   already caused bugs because userspace assumed that this cannot happen
>   (Whether or not this is a sane assumption is irrelevant.).
>
> - systemd-coredump will be spawned as a child of system_unbound_wq. So
>   it is not a child of any userspace process and specifically not a
>   child of PID 1. It cannot be waited upon and is in a weird hybrid
>   upcall which are difficult for userspace to control correctly.
>
> - systemd-coredump is spawned with full kernel privileges. This
>   necessitates all kinds of weird privilege dropping excercises in
>   userspace to make this safe.
>
> - A new usermode helper has to be spawned for each crashing process.
>
> This series adds a new mode:
>
> (3) Dumping into an abstract AF_UNIX socket.
>
> Userspace can set /proc/sys/kernel/core_pattern to:
>
>         @/path/to/coredump.socket
>
> The "@" at the beginning indicates to the kernel that an AF_UNIX
> coredump socket will be used to process coredumps.
>
> The coredump socket must be located in the initial mount namespace.
> When a task coredumps it opens a client socket in the initial network
> namespace and connects to the coredump socket.
>
> - The coredump server should use SO_PEERPIDFD to get a stable handle on
>   the connected crashing task. The retrieved pidfd will provide a stable
>   reference even if the crashing task gets SIGKILLed while generating
>   the coredump.
>
> - When a coredump connection is initiated use the socket cookie as the
>   coredump cookie and store it in the pidfd. The receiver can now easily
>   authenticate that the connection is coming from the kernel.
>
>   Unless the coredump server expects to handle connection from
>   non-crashing task it can validate that the connection has been made from
>   a crashing task:
>
>      fd_coredump = accept4(fd_socket, NULL, NULL, SOCK_CLOEXEC);
>      getsockopt(fd_coredump, SOL_SOCKET, SO_PEERPIDFD, &fd_peer_pidfd, &fd_peer_pidfd_len);
>
>      struct pidfd_info info = {
>              info.mask = PIDFD_INFO_EXIT | PIDFD_INFO_COREDUMP,
>      };
>
>      ioctl(pidfd, PIDFD_GET_INFO, &info);
>      /* Refuse connections that aren't from a crashing task. */
>      if (!(info.mask & PIDFD_INFO_COREDUMP) || !(info.coredump_mask & PIDFD_COREDUMPED) )
>              close(fd_coredump);
>
>      /*
>       * Make sure that the coredump cookie matches the connection cookie.
>       * If they don't it's not the coredump connection from the kernel.
>       * We'll get another connection request in a bit.
>       */
>      getsocketop(fd_coredump, SOL_SOCKET, SO_COOKIE, &peer_cookie, &peer_cookie_len);
>      if (!info.coredump_cookie || (info.coredump_cookie != peer_cookie))
>              close(fd_coredump);
>
>   The kernel guarantees that by the time the connection is made the
>   coredump info is available.
>
> - By setting core_pipe_limit non-zero userspace can guarantee that the
>   crashing task cannot be reaped behind it's back and thus process all
>   necessary information in /proc/<pid>. The SO_PEERPIDFD can be used to
>   detect whether /proc/<pid> still refers to the same process.
>
>   The core_pipe_limit isn't used to rate-limit connections to the
>   socket. This can simply be done via AF_UNIX socket directly.
>
> - The pidfd for the crashing task will contain information how the task
>   coredumps. The PIDFD_GET_INFO ioctl gained a new flag
>   PIDFD_INFO_COREDUMP which can be used to retreive the coredump
>   information.
>
>   If the coredump gets a new coredump client connection the kernel
>   guarantees that PIDFD_INFO_COREDUMP information is available.
>   Currently the following information is provided in the new
>   @coredump_mask extension to struct pidfd_info:
>
>   * PIDFD_COREDUMPED is raised if the task did actually coredump.
>   * PIDFD_COREDUMP_SKIP is raised if the task skipped coredumping (e.g.,
>     undumpable).
>   * PIDFD_COREDUMP_USER is raised if this is a regular coredump and
>     doesn't need special care by the coredump server.
>   * PIDFD_COREDUMP_ROOT is raised if the generated coredump should be
>     treated as sensitive and the coredump server should restrict access
>     to the generated coredump to sufficiently privileged users.
>
> - The coredump server should mark itself as non-dumpable.
>
> - A container coredump server in a separate network namespace can simply
>   bind to another well-know address and systemd-coredump fowards
>   coredumps to the container.
>
> - Coredumps could in the future also be handled via per-user/session
>   coredump servers that run only with that users privileges.
>
>   The coredump server listens on the coredump socket and accepts a
>   new coredump connection. It then retrieves SO_PEERPIDFD for the
>   client, inspects uid/gid and hands the accepted client to the users
>   own coredump handler which runs with the users privileges only
>   (It must of coure pay close attention to not forward crashing suid
>   binaries.).
>
> The new coredump socket will allow userspace to not have to rely on
> usermode helpers for processing coredumps and provides a safer way to
> handle them instead of relying on super privileged coredumping helpers.
>
> This will also be significantly more lightweight since no fork()+exec()
> for the usermodehelper is required for each crashing process. The
> coredump server in userspace can just keep a worker pool.
>
> Signed-off-by: Christian Brauner <brauner@...nel.org>
> ---
> Changes in v7:
> - Use regular AF_UNIX sockets instead of abstract AF_UNIX sockets. This
>   fixes the permission problems as userspace can ensure that the socket
>   path cannot be rebound by arbitrary unprivileged userspace via regular
>   path permissions.
>
>   This means:
>   - We don't require privilege checks on a reserved abstract AF_UNIX
>     namespace
>   - We don't require a fixed address for the coredump socket.
>   - We don't need to use abstract unix sockets at all.
>   - We don't need  special socket cookie magic in the
>     /proc/sys/kernel/core_pattern handler.
>   - We are able to set /proc/sys/kernel/core_pattern statically without
>     having any socket bound.
>
>   That's all complaints addressed.
>
>   Simply massage unix_find_bsd() to be able to handle this and always
>   lookup the coredump socket in the initial mount namespace with
>   appropriate credentials. The same thing we do for looking up other
>   parts in the kernel like this. Only the lookup happens this way.
>   Actual connection credentials are obviously from the coredumping task.
> - Link to v6: https://lore.kernel.org/20250512-work-coredump-socket-v6-0-c51bc3450727@kernel.org
>
> Changes in v6:
> - Use the socket cookie to verify the coredump server.
> - Link to v5: https://lore.kernel.org/20250509-work-coredump-socket-v5-0-23c5b14df1bc@kernel.org
>
> Changes in v5:
> - Don't use a prefix just the specific address.
> - Link to v4: https://lore.kernel.org/20250507-work-coredump-socket-v4-0-af0ef317b2d0@kernel.org
>
> Changes in v4:
> - Expose the coredump socket cookie through the pidfd. This allows the
>   coredump server to easily recognize coredump socket connections.
> - Link to v3: https://lore.kernel.org/20250505-work-coredump-socket-v3-0-e1832f0e1eae@kernel.org
>
> Changes in v3:
> - Use an abstract unix socket.
> - Add documentation.
> - Add selftests.
> - Link to v2: https://lore.kernel.org/20250502-work-coredump-socket-v2-0-43259042ffc7@kernel.org
>
> Changes in v2:
> - Expose dumpability via PIDFD_GET_INFO.
> - Place COREDUMP_SOCK handling under CONFIG_UNIX.
> - Link to v1: https://lore.kernel.org/20250430-work-coredump-socket-v1-0-2faf027dbb47@kernel.org
>
> ---
> Christian Brauner (9):
>       coredump: massage format_corname()
>       coredump: massage do_coredump()
>       coredump: reflow dump helpers a little
>       coredump: add coredump socket
>       pidfs, coredump: add PIDFD_INFO_COREDUMP
>       coredump: show supported coredump modes
>       coredump: validate socket name as it is written
>       selftests/pidfd: add PIDFD_INFO_COREDUMP infrastructure
>       selftests/coredump: add tests for AF_UNIX coredumps
>
>  fs/coredump.c                                     | 392 +++++++++++++----
>  fs/pidfs.c                                        |  79 ++++
>  include/linux/net.h                               |   1 +
>  include/linux/pidfs.h                             |  10 +
>  include/uapi/linux/pidfd.h                        |  22 +
>  net/unix/af_unix.c                                |  60 ++-
>  tools/testing/selftests/coredump/stackdump_test.c | 514 +++++++++++++++++++++-
>  tools/testing/selftests/pidfd/pidfd.h             |  23 +
>  8 files changed, 996 insertions(+), 105 deletions(-)
> ---
> base-commit: 4dd6566b5a8ca1e8c9ff2652c2249715d6c64217
> change-id: 20250429-work-coredump-socket-87cc0f17729c

Looks great to me and we can for sure use this in systemd-coredump,
thanks, for the series:

Acked-by: Luca Boccassi <luca.boccassi@...il.com>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ