lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c31bc7c2-52b4-4a91-ae0f-259411145432@nvidia.com>
Date: Sat, 6 Dec 2025 09:56:50 -0500
From: Joel Fernandes <joelagnelf@...dia.com>
To: John Stultz <jstultz@...gle.com>
Cc: LKML <linux-kernel@...r.kernel.org>, Qais Yousef <qyousef@...alina.io>,
 Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
 Juri Lelli <juri.lelli@...hat.com>,
 Vincent Guittot <vincent.guittot@...aro.org>,
 Dietmar Eggemann <dietmar.eggemann@....com>,
 Valentin Schneider <vschneid@...hat.com>,
 Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
 Zimuzo Ezeozue <zezeozue@...gle.com>, Mel Gorman <mgorman@...e.de>,
 Will Deacon <will@...nel.org>, Waiman Long <longman@...hat.com>,
 Boqun Feng <boqun.feng@...il.com>, "Paul E. McKenney" <paulmck@...nel.org>,
 Metin Kaya <Metin.Kaya@....com>, Xuewen Yan <xuewen.yan94@...il.com>,
 K Prateek Nayak <kprateek.nayak@....com>,
 Thomas Gleixner <tglx@...utronix.de>,
 Daniel Lezcano <daniel.lezcano@...aro.org>, Tejun Heo <tj@...nel.org>,
 David Vernet <void@...ifault.com>, Andrea Righi <arighi@...dia.com>,
 Changwoo Min <changwoo@...lia.com>, sched-ext@...ts.linux.dev,
 kernel-team@...roid.com
Subject: Re: [RFC][PATCH] sched/ext: Split curr|donor references properly

Hi John,

On 12/5/2025 11:49 PM, John Stultz wrote:
> On Fri, Dec 5, 2025 at 6:47 PM Joel Fernandes <joelagnelf@...dia.com> wrote:
>> On Sat, Dec 06, 2025 at 12:14:45AM +0000, John Stultz wrote:
>>> With proxy-exec, we want to do the accounting against the donor
>>> most of the time. Without proxy-exec, there should be no
>>> difference as the rq->donor and rq->curr are the same.
>>>
>>> So rework the logic to reference the rq->donor where appropriate.
>>>
>>> Also add donor info to scx_dump_state()
>>>
>>> Since CONFIG_SCHED_PROXY_EXEC currently depends on
>>> !CONFIG_SCHED_CLASS_EXT, this should have no effect
>>> (other then the extra donor output in scx_dump_state),
>>> but this is one step needed to eventually remove that
>>> constraint for proxy-exec.
>>>
>>> Just wanted to send this out for early review prior to LPC.
>>>
>>> Feedback or thoughts would be greatly appreciated!
>>
>> Hi John,
>>
>> I'm wondering if this will work well for BPF tasks because my understanding
>> is that some scheduler BPF programs also monitor runtime statistics. If they are unaware of proxy execution, how will it work?
> 
> Good question! Be sure to come to my LPC talk on this next week! :)
> https://lpc.events/event/19/contributions/2032/

Sure, will try to make it to the talk and hopefully no conflicts. :)

> 
>> I don't see any code in the patch that passes the donor information to the
>> BPF ops, for instance. I would really like the SCX folks to chime in before
>> we can move this patch forward. Thanks for marking it as an RFC.
> 
> Oh yes, this RFC is intended to just be something to open initial
> discussion for the session next week. I'm very much hoping to get
> further thoughts on it, in person, next week.

Cool!
>> We need to get a handle on how a scheduler BPF program will pass information
>> about the donor to the currently executing task. If we can make this happen
>> transparently, that's ideal. Otherwise, we may have to pass both the donor
>> task and the currently executing task to the BPF ops.
> 
> So, one thing about proxy-exec is the class schedulers are pretty much
> are to keep their existing behavior. Its just the core scheduler may
> not actually run what they pick.

Didn't we have complexities with RT, push-pull lists and such? That was class
specific, no?

> That's ok, as the task they pick becomes the rq->donor that we want to
> use for pretty much all the scheduling accounting (the exception being
> the cputime accounting necessary for cputimers on the running task to
> behave sanely as well as top output - as you have helped identify
> earlier).  So this patch is just shifting the class scheduler to
> utilize the donor pointer instead of curr, so we are consistent in the
> proxy case.
> 
> As for the concern about communicating the split context (rq->donor vs
> rq->curr) to the bpf program, to my understanding, the DSQ abstraction
> seems to make that unnecessary. It provides a general enough interface
> for the bpf logic, that it seems we only have to worry about the split
> context on the the sched/ext.c logic side as it processes the DSQ.
> That said, I'm no sched_ext expert, so I'm hoping at LPC we can find
> any edge cases that do need to be dealt with.

Right, so it is exactly these pointer shifts I was concerned about. Runtime
callbacks such 'stopping' [1] directly use p->slice.

[1] https://github.com/sched-ext/scx/blob/main/scheds/c/scx_simple.bpf.c#L124

So we have to pass the correct 'p' to these callbacks. Did I miss something
about your patch though that handles this?

If this is indeed a problem, maybe one way to get around it initially is to make
'proxy exec' an opt-in for BPF schedulers. But then we'd have to handle a hybrid
world.

At a high level, my understanding is BPF schedulers have a lot of say in how to
schedule including precise time slice and preemption control (give or take level
of control and performance reasons). You can in fact have your own 'userland'
queues that the kernel is unaware, IIUC. I am not sure if proxy exec will
transparently work for all those usecases. It will probably work properly only
when BPF scheduling in userland is simple and most of the scheduling is done by
non-BPF kernel code.

Maybe this isn't a problem at all, but I thought I'd double check. :)

> Thanks again for the thought here! Always appreciate your feedback!
Sure, any time John! I am glad to see your patches continuously flowing.

cheers,

 - Joel


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ