netdev - Re: [PATCH v7 5/9] pidfs, coredump: add PIDFD_INFO

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAG48ez3-=B1aTftz0srNjV7_t6QqGuk41LFAe6_qeXtXWL3+PA@mail.gmail.com>
Date: Thu, 15 May 2025 22:56:26 +0200
From: Jann Horn <jannh@...gle.com>
To: Christian Brauner <brauner@...nel.org>
Cc: linux-fsdevel@...r.kernel.org, Daniel Borkmann <daniel@...earbox.net>, 
	Kuniyuki Iwashima <kuniyu@...zon.com>, Eric Dumazet <edumazet@...gle.com>, Oleg Nesterov <oleg@...hat.com>, 
	"David S. Miller" <davem@...emloft.net>, Alexander Viro <viro@...iv.linux.org.uk>, 
	Daan De Meyer <daan.j.demeyer@...il.com>, David Rheinsberg <david@...dahead.eu>, 
	Jakub Kicinski <kuba@...nel.org>, Jan Kara <jack@...e.cz>, 
	Lennart Poettering <lennart@...ttering.net>, Luca Boccassi <bluca@...ian.org>, Mike Yuan <me@...dnzj.com>, 
	Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>, 
	Zbigniew Jędrzejewski-Szmek <zbyszek@...waw.pl>, 
	linux-kernel@...r.kernel.org, netdev@...r.kernel.org, 
	linux-security-module@...r.kernel.org, 
	Alexander Mikhalitsyn <alexander@...alicyn.com>
Subject: Re: [PATCH v7 5/9] pidfs, coredump: add PIDFD_INFO_COREDUMP

On Thu, May 15, 2025 at 12:04 AM Christian Brauner <brauner@...nel.org> wrote:
> Extend the PIDFD_INFO_COREDUMP ioctl() with the new PIDFD_INFO_COREDUMP
> mask flag. This adds the fields @coredump_mask and @coredump_cookie to
> struct pidfd_info.

FWIW, now that you're using path-based sockets and override_creds(),
one option may be to drop this patch and say "if you don't want
untrusted processes to directly connect to the coredumping socket,
just set the listening socket to mode 0000 or mode 0600"...

> Signed-off-by: Christian Brauner <brauner@...nel.org>
[...]
> diff --git a/fs/coredump.c b/fs/coredump.c
> index e1256ebb89c1..bfc4a32f737c 100644
> --- a/fs/coredump.c
> +++ b/fs/coredump.c
[...]
> @@ -876,8 +880,34 @@ void do_coredump(const kernel_siginfo_t *siginfo)
>                         goto close_fail;
>                 }
>
> +               /*
> +                * Set the thread-group leader pid which is used for the
> +                * peer credentials during connect() below. Then
> +                * immediately register it in pidfs...
> +                */
> +               cprm.pid = task_tgid(current);
> +               retval = pidfs_register_pid(cprm.pid);
> +               if (retval) {
> +                       sock_release(socket);
> +                       goto close_fail;
> +               }
> +
> +               /*
> +                * ... and set the coredump information so userspace
> +                * has it available after connect()...
> +                */
> +               pidfs_coredump(&cprm);
> +
> +               /*
> +                * ... On connect() the peer credentials are recorded
> +                * and @cprm.pid registered in pidfs...

I don't understand this comment. Wasn't "@cprm.pid registered in
pidfs" above with the explicit `pidfs_register_pid(cprm.pid)`?

> +                */
>                 retval = kernel_connect(socket, (struct sockaddr *)(&addr),
>                                         addr_len, O_NONBLOCK | SOCK_COREDUMP);
> +
> +               /* ... So we can safely put our pidfs reference now... */
> +               pidfs_put_pid(cprm.pid);

Why can we safely put the pidfs reference now but couldn't do it
before the kernel_connect()? Does the kernel_connect() look up this
pidfs entry by calling something like pidfs_alloc_file()? Or does that
only happen later on, when the peer does getsockopt(SO_PEERPIDFD)?

>                 if (retval) {
>                         if (retval == -EAGAIN)
>                                 coredump_report_failure("Coredump socket %s receive queue full", addr.sun_path);
[...]
> diff --git a/fs/pidfs.c b/fs/pidfs.c
> index 3b39e471840b..d7b9a0dd2db6 100644
> --- a/fs/pidfs.c
> +++ b/fs/pidfs.c
[...]
> @@ -280,6 +299,13 @@ static long pidfd_info(struct file *file, unsigned int cmd, unsigned long arg)
>                 }
>         }
>
> +       if (mask & PIDFD_INFO_COREDUMP) {
> +               kinfo.mask |= PIDFD_INFO_COREDUMP;
> +               smp_rmb();

I assume I would regret it if I asked what these barriers are for,
because the answer is something terrifying about how we otherwise
don't have a guarantee that memory accesses can't be reordered between
multiple subsequent syscalls or something like that?

checkpatch complains about the lack of comments on these memory barriers.

> +               kinfo.coredump_cookie = READ_ONCE(pidfs_i(inode)->__pei.coredump_cookie);
> +               kinfo.coredump_mask = READ_ONCE(pidfs_i(inode)->__pei.coredump_mask);
> +       }
> +
>         task = get_pid_task(pid, PIDTYPE_PID);
>         if (!task) {
>                 /*
[...]
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index a9d1c9ba2961..053d2e48e918 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
[...]
> @@ -742,6 +743,7 @@ static void unix_release_sock(struct sock *sk, int embrion)
>
>  struct unix_peercred {
>         struct pid *peer_pid;
> +       u64 cookie;

Maybe add a comment here documenting that for now, this is assumed to
be used exclusively for coredump sockets.


>         const struct cred *peer_cred;
>  };
>