[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Z4GiwRDrRVmr7kSR@slm.duckdns.org>
Date: Fri, 10 Jan 2025 12:44:17 -1000
From: Tejun Heo <tj@...nel.org>
To: Andrea Righi <arighi@...dia.com>
Cc: David Vernet <void@...ifault.com>, Changwoo Min <changwoo@...lia.com>,
Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Valentin Schneider <vschneid@...hat.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v8] sched_ext: idle: Refresh idle masks during
idle-to-idle transitions
On Fri, Jan 10, 2025 at 11:16:31PM +0100, Andrea Righi wrote:
> With the consolidation of put_prev_task/set_next_task(), see
> commit 436f3eed5c69 ("sched: Combine the last put_prev_task() and the
> first set_next_task()"), we are now skipping the transition between
> these two functions when the previous and the next tasks are the same.
>
> As a result, the scx idle state of a CPU is updated only when
> transitioning to or from the idle thread. While this is generally
> correct, it can lead to uneven and inefficient core utilization in
> certain scenarios [1].
>
> A typical scenario involves proactive wake-ups: scx_bpf_pick_idle_cpu()
> selects and marks an idle CPU as busy, followed by a wake-up via
> scx_bpf_kick_cpu(), without dispatching any tasks. In this case, the CPU
> continues running the idle thread, returns to idle, but remains marked
> as busy, preventing it from being selected again as an idle CPU (until a
> task eventually runs on it and releases the CPU).
>
> For example, running a workload that uses 20% of each CPU, combined with
> an scx scheduler using proactive wake-ups, results in the following core
> utilization:
>
> CPU 0: 25.7%
> CPU 1: 29.3%
> CPU 2: 26.5%
> CPU 3: 25.5%
> CPU 4: 0.0%
> CPU 5: 25.5%
> CPU 6: 0.0%
> CPU 7: 10.5%
>
> To address this, refresh the idle state also in pick_task_idle(), during
> idle-to-idle transitions, but only trigger ops.update_idle() on actual
> state changes to prevent unnecessary updates to the scx scheduler and
> maintain balanced state transitions.
>
> With this change in place, the core utilization in the previous example
> becomes the following:
>
> CPU 0: 18.8%
> CPU 1: 19.4%
> CPU 2: 18.0%
> CPU 3: 18.7%
> CPU 4: 19.3%
> CPU 5: 18.9%
> CPU 6: 18.7%
> CPU 7: 19.3%
>
> [1] https://github.com/sched-ext/scx/pull/1139
>
> Fixes: 7c65ae81ea86 ("sched_ext: Don't call put_prev_task_scx() before picking the next task")
> Signed-off-by: Andrea Righi <arighi@...dia.com>
Applied to sched_ext/for-6.13-fixes.
Thanks.
--
tejun
Powered by blists - more mailing lists