linux-kernel - Re: [PATCH v2 02/23] bpf: initial support for attaching struct ops to cgroups

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAADnVQJ+4a97bp26BOpD5A9LOzfJ+XxyNt4bdG8n7jaO6+nV3Q@mail.gmail.com>
Date: Wed, 29 Oct 2025 15:43:39 -0700
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Roman Gushchin <roman.gushchin@...ux.dev>
Cc: Song Liu <song@...nel.org>, Tejun Heo <tj@...nel.org>, 
	Andrew Morton <akpm@...ux-foundation.org>, LKML <linux-kernel@...r.kernel.org>, 
	Alexei Starovoitov <ast@...nel.org>, Suren Baghdasaryan <surenb@...gle.com>, Michal Hocko <mhocko@...nel.org>, 
	Shakeel Butt <shakeel.butt@...ux.dev>, Johannes Weiner <hannes@...xchg.org>, 
	Andrii Nakryiko <andrii@...nel.org>, JP Kobryn <inwardvessel@...il.com>, 
	linux-mm <linux-mm@...ck.org>, 
	"open list:CONTROL GROUP (CGROUP)" <cgroups@...r.kernel.org>, bpf <bpf@...r.kernel.org>, 
	Martin KaFai Lau <martin.lau@...nel.org>, Kumar Kartikeya Dwivedi <memxor@...il.com>
Subject: Re: [PATCH v2 02/23] bpf: initial support for attaching struct ops to cgroups

On Wed, Oct 29, 2025 at 2:53 PM Roman Gushchin <roman.gushchin@...ux.dev> wrote:
>
> Song Liu <song@...nel.org> writes:
>
> > Hi Tejun,
> >
> > On Wed, Oct 29, 2025 at 1:36 PM Tejun Heo <tj@...nel.org> wrote:
> >>
> >> On Wed, Oct 29, 2025 at 01:25:52PM -0700, Roman Gushchin wrote:
> >> > > BTW, for sched_ext sub-sched support, I'm just adding cgroup_id to
> >> > > struct_ops, which seems to work fine. It'd be nice to align on the same
> >> > > approach. What are the benefits of doing this through fd?
> >> >
> >> > Then you can attach a single struct ops to multiple cgroups (or Idk
> >> > sockets or processes or some other objects in the future).
> >> > And IMO it's just a more generic solution.
> >>
> >> I'm not very convinced that sharing a single struct_ops instance across
> >> multiple cgroups would be all that useful. If you map this to normal
> >> userspace programs, a given struct_ops instance is package of code and all
> >> the global data (maps). ie. it's not like running the same program multiple
> >> times against different targets. It's more akin to running a single program
> >> instance which can handle multiple targets.
> >>
> >> Maybe that's useful in some cases, but that program would have to explicitly
> >> distinguish the cgroups that it's attached to. I have a hard time imagining
> >> use cases where a single struct_ops has to service multiple disjoint cgroups
> >> in the hierarchy and it ends up stepping outside of the usual operation
> >> model of cgroups - commonality being expressed through the hierarchical
> >> structure.
> >
> > How about we pass a pointer to mem_cgroup (and/or related pointers)
> > to all the callbacks in the struct_ops? AFAICT, in-kernel _ops structures like
> > struct file_operations and struct tcp_congestion_ops use this method. And
> > we can actually implement struct tcp_congestion_ops in BPF. With the
> > struct tcp_congestion_ops model, the struct_ops map and the struct_ops
> > link are both shared among multiple instances (sockets).
>
> +1 to this.
> I agree it might be debatable when it comes to cgroups, but when it comes to
> sockets or similar objects, having a separate struct ops per object
> isn't really an option.

I think the general bpf philosophy that load and attach are two
separate steps. For struct-ops it's almost there, but not quite.
struct-ops shouldn't be an exception.
The bpf infra should be able to load a set of progs (aka struct-ops)
and attach it with a link to different entities. Like cgroups.
I think sched-ext should do that too. Even if there is no use case
today for the same sched-ext in two different cgroups.
For bpf-oom I can imagine a use case where container management sw
would pre-load struct-ops and then attach it later to different
containers depending on container configs. These container might
be peers in hierarchy, but attaching to their parent won't be
equivalent, since other peers might not need that bpf-oom management.
The "workaround" could be to create another cgroup layer
between parent and container, but that becomes messy, since now
there is a cgroup only for the purpose of attaching bpf-oom to it.

Whether struct-ops link attach is using cgroup_fd or cgroup_id
is debatable. I think FD is cleaner.