lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250310-hinzog-unzweifelhaft-9105447ec12d@brauner>
Date: Mon, 10 Mar 2025 12:27:00 +0100
From: Christian Brauner <brauner@...nel.org>
To: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@...onical.com>
Cc: kuniyu@...zon.com, linux-kernel@...r.kernel.org, 
	netdev@...r.kernel.org, cgroups@...r.kernel.org, "David S. Miller" <davem@...emloft.net>, 
	Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, 
	Paolo Abeni <pabeni@...hat.com>, Willem de Bruijn <willemb@...gle.com>, 
	Leon Romanovsky <leon@...nel.org>, Arnd Bergmann <arnd@...db.de>, 
	Lennart Poettering <mzxreary@...inter.de>, Luca Boccassi <bluca@...ian.org>, Tejun Heo <tj@...nel.org>, 
	Johannes Weiner <hannes@...xchg.org>, Michal Koutný <mkoutny@...e.com>, 
	Shuah Khan <shuah@...nel.org>
Subject: Re: [PATCH net-next 0/4] Add getsockopt(SO_PEERCGROUPID) and fdinfo
 API to retreive socket's peer cgroup id

On Mon, Mar 10, 2025 at 09:52:31AM +0100, Christian Brauner wrote:
> On Sun, Mar 09, 2025 at 02:28:11PM +0100, Alexander Mikhalitsyn wrote:
> > 1. Add socket cgroup id and socket's peer cgroup id in socket's fdinfo
> > 2. Add SO_PEERCGROUPID which allows to retrieve socket's peer cgroup id
> > 3. Add SO_PEERCGROUPID kselftest
> > 
> > Generally speaking, this API allows race-free resolution of socket's peer cgroup id.
> > Currently, to do that SCM_CREDENTIALS/SCM_PIDFD -> pid -> /proc/<pid>/cgroup sequence
> > is used which is racy.
> > 
> > As we don't add any new state to the socket itself there is no potential locking issues
> > or performance problems. We use already existing sk->sk_cgrp_data.
> > 
> > We already have analogical interfaces to retrieve this
> > information:
> > - inet_diag: INET_DIAG_CGROUP_ID
> > - eBPF: bpf_sk_cgroup_id
> > 
> > Having getsockopt() interface makes sense for many applications, because using eBPF is
> > not always an option, while inet_diag has obvious complexety and performance drawbacks
> > if we only want to get this specific info for one specific socket.
> > 
> > Idea comes from UAPI kernel group:
> > https://uapi-group.org/kernel-features/
> > 
> > Huge thanks to Christian Brauner, Lennart Poettering and Luca Boccassi for proposing
> > and exchanging ideas about this.
> 
> Seems fine to me,
> Reviewed-by: Christian Brauner <brauner@...nel.org>

One wider conceptual comment.

Starting with v6.15 it is possible to retrieve exit information from
pidfds even after the task has been reaped. So if someone opens a pidfd
via pidfd_open() and that task gets reaped by the parent it is possible
to call PIDFD_INFO_EXIT and you can retrieve the exit status and the
cgroupid of the task that was reaped. That works even after all task
linkage has been removed from struct pid.

The system call api doesn't allow the creation of pidfds for reaped
processes. It wouldn't be possible as the pid number will have already
been released.

Both SO_PEERPIDFD and SO_PASSPIDFD also don't allow the creation of
pidfds for already reaped peers or senders.

But that doesn't have to be the case since we always have the struct pid
available. So it's entirely possible to hand out a pidfd to a reaped
process if it's guaranteed that exit information is available. If it's
not then this would be a bug.

The trick is that when a struct pid is stashed it needs to also allocate
a pidfd inode. That could simply be done by a helper get_pidfs_pid()
which takes a reference to the struct pid and ensures that space for
recording exit information is available.

With that done SO_PEERCGROUPID isn't needed per se as it will be
possible to get the cgroupid and exit status from the pidfd.

>From a cursory look that should be possible to do without too much work.
I'm just pointing this out as an alternative.

There's one restriction that this would be subject to that
SO_PEERCGROUPID isn't. The SO_PEERCGROUPID is exposed for any process
whereas PIDFD_GET_INFO ioctls (that includes the PIDFD_INFO_EXIT) option
is only available for processes within the receivers pid namespace
hierarchy.

But in any case, enabling pidfds for such reaped processes might still
be useful since it would mean receivers could get exit information for
pidfds within their pid namespace hierarchy.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ