[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zw6KuWHPn9d6GWOK@gpd3>
Date: Tue, 15 Oct 2024 17:31:05 +0200
From: Andrea Righi <andrea.righi@...ux.dev>
To: Tejun Heo <tj@...nel.org>
Cc: David Vernet <void@...ifault.com>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>, Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Valentin Schneider <vschneid@...hat.com>,
linux-kernel@...r.kernel.org, bpf@...r.kernel.org
Subject: Re: [PATCH v4] sched_ext: Trigger ops.update_idle() from
pick_task_idle()
On Tue, Oct 15, 2024 at 04:41:26AM -1000, Tejun Heo wrote:
> Hello, Andrea.
>
> On Tue, Oct 15, 2024 at 01:15:39PM +0200, Andrea Righi wrote:
> ...
> > For example, a BPF scheduler might use logic like the following to keep
> > the CPU active under specific conditions:
> >
> > void BPF_STRUCT_OPS(sched_update_idle, s32 cpu, bool idle)
> > {
> > if (!idle)
> > return;
> > if (condition)
> > scx_bpf_kick_cpu(cpu, 0);
> > }
> >
> > A call to scx_bpf_kick_cpu() wakes up the CPU, so in theory,
> > ops.update_idle() should be triggered again until the condition becomes
> > false. However, this doesn't happen, and scx_bpf_kick_cpu() doesn't
> > produce the expected effect.
>
> I thought more about this scenario and I'm not sure anymore whether we want
> to guarantee that scx_bpf_kick_cpu() is followed by update_idle(cpu, true).
> Here are a couple considerations:
>
> - As implemented, the transtions aren't balanced. ie. When the above
> happens, update_idle(cpu, true) will be generated multiple times without
> intervening update_idle(cpu, false). We can insert artificial false
> transtions but that's cumbersome and...
Agreed, I wouldn't suggest adding artificial false events.
>
> - For the purpose of determining whether a CPU is idle for e.g. task
> placement from ops.select_cpu(). The CPU *should* be considered idle in
> this polling state.
>
> Overall, it feels a bit contrived to generate update_idle() events
> consecutively for this. If a scheduler wants to poll in idle state, can't it
> do something like the following?
>
> - Trigger kick from update_idle(cpu, true) and remember that the CPU is in
> the polling state.
>
> - Keep kicking from ops.dispatch() until polling state is cleared.
>
> As what kick() guarnatees is at least one dispatch event after kicking, this
> is guaranteed to be correct and the control flow, while a bit more
> complicated, makes sense - it triggers dispatch on idle transition and keeps
> dispatching in the idle state.
>
> What do you think?
That seems to work in theory, I'll run some tests to confirm that it
also works in practice. :)
It looks definitely nicer than triggering multiple ops.update_idle()
from the kernel and we can maintain the proper semantic of triggering
update_idle() only on actual idle state changes.
Thanks,
-Andrea
Powered by blists - more mailing lists