linux-kernel - Re: [PATCH v4] sched_ext: Trigger ops.update_idle() from pick_task

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Zw6KuWHPn9d6GWOK@gpd3>
Date: Tue, 15 Oct 2024 17:31:05 +0200
From: Andrea Righi <andrea.righi@...ux.dev>
To: Tejun Heo <tj@...nel.org>
Cc: David Vernet <void@...ifault.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...hat.com>, Juri Lelli <juri.lelli@...hat.com>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
	Valentin Schneider <vschneid@...hat.com>,
	linux-kernel@...r.kernel.org, bpf@...r.kernel.org
Subject: Re: [PATCH v4] sched_ext: Trigger ops.update_idle() from
 pick_task_idle()

On Tue, Oct 15, 2024 at 04:41:26AM -1000, Tejun Heo wrote:
> Hello, Andrea.
> 
> On Tue, Oct 15, 2024 at 01:15:39PM +0200, Andrea Righi wrote:
> ...
> > For example, a BPF scheduler might use logic like the following to keep
> > the CPU active under specific conditions:
> > 
> > void BPF_STRUCT_OPS(sched_update_idle, s32 cpu, bool idle)
> > {
> > 	if (!idle)
> > 		return;
> > 	if (condition)
> > 		scx_bpf_kick_cpu(cpu, 0);
> > }
> > 
> > A call to scx_bpf_kick_cpu() wakes up the CPU, so in theory,
> > ops.update_idle() should be triggered again until the condition becomes
> > false. However, this doesn't happen, and scx_bpf_kick_cpu() doesn't
> > produce the expected effect.
> 
> I thought more about this scenario and I'm not sure anymore whether we want
> to guarantee that scx_bpf_kick_cpu() is followed by update_idle(cpu, true).
> Here are a couple considerations:
> 
> - As implemented, the transtions aren't balanced. ie. When the above
>   happens, update_idle(cpu, true) will be generated multiple times without
>   intervening update_idle(cpu, false). We can insert artificial false
>   transtions but that's cumbersome and...

Agreed, I wouldn't suggest adding artificial false events.

> 
> - For the purpose of determining whether a CPU is idle for e.g. task
>   placement from ops.select_cpu(). The CPU *should* be considered idle in
>   this polling state.
> 
> Overall, it feels a bit contrived to generate update_idle() events
> consecutively for this. If a scheduler wants to poll in idle state, can't it
> do something like the following?
> 
> - Trigger kick from update_idle(cpu, true) and remember that the CPU is in
>   the polling state.
> 
> - Keep kicking from ops.dispatch() until polling state is cleared.
> 
> As what kick() guarnatees is at least one dispatch event after kicking, this
> is guaranteed to be correct and the control flow, while a bit more
> complicated, makes sense - it triggers dispatch on idle transition and keeps
> dispatching in the idle state.
> 
> What do you think?

That seems to work in theory, I'll run some tests to confirm that it
also works in practice. :)

It looks definitely nicer than triggering multiple ops.update_idle()
from the kernel and we can maintain the proper semantic of triggering
update_idle() only on actual idle state changes.

Thanks,
-Andrea