[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAADnVQJp9FkPDA7oo-+yZ0SKFbE6w7FzARosLgzLmH74Vv+dow@mail.gmail.com>
Date: Wed, 29 Oct 2025 16:53:07 -0700
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Tejun Heo <tj@...nel.org>
Cc: Roman Gushchin <roman.gushchin@...ux.dev>, Song Liu <song@...nel.org>, 
	Andrew Morton <akpm@...ux-foundation.org>, LKML <linux-kernel@...r.kernel.org>, 
	Alexei Starovoitov <ast@...nel.org>, Suren Baghdasaryan <surenb@...gle.com>, Michal Hocko <mhocko@...nel.org>, 
	Shakeel Butt <shakeel.butt@...ux.dev>, Johannes Weiner <hannes@...xchg.org>, 
	Andrii Nakryiko <andrii@...nel.org>, JP Kobryn <inwardvessel@...il.com>, 
	linux-mm <linux-mm@...ck.org>, 
	"open list:CONTROL GROUP (CGROUP)" <cgroups@...r.kernel.org>, bpf <bpf@...r.kernel.org>, 
	Martin KaFai Lau <martin.lau@...nel.org>, Kumar Kartikeya Dwivedi <memxor@...il.com>
Subject: Re: [PATCH v2 02/23] bpf: initial support for attaching struct ops to cgroups
On Wed, Oct 29, 2025 at 3:53 PM Tejun Heo <tj@...nel.org> wrote:
>
> Hello,
>
> On Wed, Oct 29, 2025 at 03:43:39PM -0700, Alexei Starovoitov wrote:
> ...
> > I think the general bpf philosophy that load and attach are two
> > separate steps. For struct-ops it's almost there, but not quite.
> > struct-ops shouldn't be an exception.
> > The bpf infra should be able to load a set of progs (aka struct-ops)
> > and attach it with a link to different entities. Like cgroups.
> > I think sched-ext should do that too. Even if there is no use case
> > today for the same sched-ext in two different cgroups.
>
> I'm not sure it's just that there's no use case.
I think there will be a use case for sched-ext as well,
just the current way the scheds are written is too specific.
There is cgroup local storage, so scheds can certainly
store whatever state there.
Potentially we can improve UX further by utilizing __thread on bpf.c
side in some way.
> - How would recursion work with private stacks? Aren't those attached to
>   each BPF program?
yes. private stack is per prog, but why does it matter?
I'm not suggesting that the same prog to be attached at different
levels of the cgroup hierarchy, because such configuration
will indeed trigger recursion prevention logic (with or without private
stack).
But having one logical sched-ext prog set to manage tasks
in container A and in container B makes sense as a use case to me
where A and B are different cgroups.
DSQs can be cgroup scoped too.
> - Wouldn't that also complicate attributing kfunc calls to the handle
>   instance?
you mean the whole prog_assoc stuff ?
That's orthogonal. tracing progs are global so there is
no perfect place to associate them with. struct-ops map
is the best we can do today, but ideally it's run_ctx
that should be per-attachment. Like cookie.
> If there is one struct_ops per cgroup, the oom kill kfunc can
>   look that up and then verify that the struct_ops has authority over the
>   target process. Multiple attachments can work too but that'd require
>   iterating all attachments, right?
Are you talking about bpf_oom_kill_process() kfunc from these patch set?
I don't think it needs any changes. oom context is passed into prog
and passed along to kfunc. Doesn't matter the cgroup origin.
Powered by blists - more mailing lists
 
