lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Z4GiwRDrRVmr7kSR@slm.duckdns.org>
Date: Fri, 10 Jan 2025 12:44:17 -1000
From: Tejun Heo <tj@...nel.org>
To: Andrea Righi <arighi@...dia.com>
Cc: David Vernet <void@...ifault.com>, Changwoo Min <changwoo@...lia.com>,
	Ingo Molnar <mingo@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Juri Lelli <juri.lelli@...hat.com>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
	Valentin Schneider <vschneid@...hat.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v8] sched_ext: idle: Refresh idle masks during
 idle-to-idle transitions

On Fri, Jan 10, 2025 at 11:16:31PM +0100, Andrea Righi wrote:
> With the consolidation of put_prev_task/set_next_task(), see
> commit 436f3eed5c69 ("sched: Combine the last put_prev_task() and the
> first set_next_task()"), we are now skipping the transition between
> these two functions when the previous and the next tasks are the same.
> 
> As a result, the scx idle state of a CPU is updated only when
> transitioning to or from the idle thread. While this is generally
> correct, it can lead to uneven and inefficient core utilization in
> certain scenarios [1].
> 
> A typical scenario involves proactive wake-ups: scx_bpf_pick_idle_cpu()
> selects and marks an idle CPU as busy, followed by a wake-up via
> scx_bpf_kick_cpu(), without dispatching any tasks. In this case, the CPU
> continues running the idle thread, returns to idle, but remains marked
> as busy, preventing it from being selected again as an idle CPU (until a
> task eventually runs on it and releases the CPU).
> 
> For example, running a workload that uses 20% of each CPU, combined with
> an scx scheduler using proactive wake-ups, results in the following core
> utilization:
> 
>  CPU 0: 25.7%
>  CPU 1: 29.3%
>  CPU 2: 26.5%
>  CPU 3: 25.5%
>  CPU 4:  0.0%
>  CPU 5: 25.5%
>  CPU 6:  0.0%
>  CPU 7: 10.5%
> 
> To address this, refresh the idle state also in pick_task_idle(), during
> idle-to-idle transitions, but only trigger ops.update_idle() on actual
> state changes to prevent unnecessary updates to the scx scheduler and
> maintain balanced state transitions.
> 
> With this change in place, the core utilization in the previous example
> becomes the following:
> 
>  CPU 0: 18.8%
>  CPU 1: 19.4%
>  CPU 2: 18.0%
>  CPU 3: 18.7%
>  CPU 4: 19.3%
>  CPU 5: 18.9%
>  CPU 6: 18.7%
>  CPU 7: 19.3%
> 
> [1] https://github.com/sched-ext/scx/pull/1139
> 
> Fixes: 7c65ae81ea86 ("sched_ext: Don't call put_prev_task_scx() before picking the next task")
> Signed-off-by: Andrea Righi <arighi@...dia.com>

Applied to sched_ext/for-6.13-fixes.

Thanks.

-- 
tejun

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ