linux-kernel - Re: [PATCH 0/6 v8] sched/fair: Add push task mechanism and handle more EAS cases

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c89d068c-6d0b-45f3-a05d-ac92f3883fb1@arm.com>
Date: Wed, 3 Dec 2025 14:06:17 +0000
From: Christian Loehle <christian.loehle@....com>
To: Vincent Guittot <vincent.guittot@...aro.org>, mingo@...hat.com,
 peterz@...radead.org, juri.lelli@...hat.com, dietmar.eggemann@....com,
 rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
 vschneid@...hat.com, linux-kernel@...r.kernel.org, pierre.gondois@....com,
 kprateek.nayak@....com
Cc: qyousef@...alina.io, hongyan.xia2@....com, luis.machado@....com
Subject: Re: [PATCH 0/6 v8] sched/fair: Add push task mechanism and handle
 more EAS cases

On 12/2/25 18:12, Vincent Guittot wrote:
> This is a subset of [1] (sched/fair: Rework EAS to handle more cases)
> 
> [1] https://lore.kernel.org/all/20250314163614.1356125-1-vincent.guittot@linaro.org/
> 
> The current Energy Aware Scheduler has some known limitations which have
> became more and more visible with features like uclamp as an example. This
> serie tries to fix some of those issues:
> - tasks stacked on the same CPU of a PD

This needs elaboration IMO as "tasks stacked on the same CPU of a PD" isn't
really an issue per se? What's the scenario being fixed here?

> - tasks stuck on the wrong CPU.
> 
> Patch 1 fixes the case where a CPU is wrongly classified as overloaded
> whereas it is capped to a lower compute capacity. This wrong classification
> can prevent periodic load balancer to select a group_misfit_task CPU
> because group_overloaded has higher priority.
> 
> Patch 2 removes the need of testing uclamp_min in cpu_overutilized to
> trigger the active migration of a task on another CPU.
> 
> Patch 3 prepares select_task_rq_fair() to be called without TTWU, Fork or
> Exec flags when we just want to look for a possible better CPU.
> 
> Patch 4 adds push call back mecanism to fair scheduler but doesn't enable
> it.

nit: here's still the mecanism typo :)

> 
> Patch 5 enable has_idle_core for !SMP system to track if there may be an
> idle CPU in the LLC.

s/!SMP/!SMT/

> 
> Patch 6 adds some conditions to enable pushing runnable tasks for EAS:
> - when a task is stuck on a CPU and the system is not overutilized.
> - if there is a possible idle CPU when the system is overutilized.

I'd find it helpful to have the motivation spelled out more verbosely here.
Why are there tasks stuck? UCLAMP_MAX? Temporarily reduced capacity?
Would be nice to have a very concrete list of scenarios/issues in mind that
are being fixed and a description of how they're fixed by this patchset.
(e.g. current behaviour, new behaviour, reason why this behaviour is the
'more' correct one).

> 
> More tests results will come later as I wanted to send the pachtset before
> LPC.
> 
> I have kept Tbench figures as I added them in v7 but results are the same
> with the correct patch 6.

Ah I was confused by this sentence at first, so now for v8 both hackbench
and tbench are same for baseline and patchset.

> 
> Tbench on dragonboard rb5
> schedutil and EAS enabled
> 
> # process     tip                   +patchset
> 1              29.3(+/-0.3%)        29.2(+/-0.2%) +0%
> 2              61.1(+/-1.8%)        61.7(+/-3.2%) +1%
> 4             260.0(+/-1.7%)       258.8(+/-2.8%) -1%       
> 8            1361.2(+/-3.1%)      1377.1(+/-1.9%) +1%
> 16            981.5(+/-0.6%)       958.0(+/-1.7%) -2%
 
So I've done some analysis on tbench in the meantime, at least for the 1-process
case, because I was puzzled by your v7 result and indeed there are plenty
of wakeups, in particular in a 10s run I see 62806 tbench wakeups
with a distribution like so (time from one wakeup to the next):
0 ms - 1 ms: 62157
1 ms - 2 ms: 44
2 ms - 3 ms: 32
3 ms - 4 ms: 5
4 ms - 5 ms: 10
5 ms - 6 ms: 6
6 ms - 7 ms: 2
7 ms - 8 ms: 2
8 ms - 9 ms: 3
12 ms - 13 ms: 2
15 ms - 16 ms: 1
16 ms - 17 ms: 1
24 ms - 25 ms: 1
95 ms - 96 ms: 1

> Hackbench didn't show any difference

hackbench is always OU once it ramped up anyway, right? So this is expected.
If I'm not mistaken neither of the workloads then are likely to run through
the changes for the series? (Both have more than enough wakeup events, hackbench
is additionally OU so EAS is mostly skipped).
Would be helpful for reviewing then to have a workload that benefits from this
push mechanism, maybe at least one with and one without UCLAMP_MAX?

> 
> Changes since v7:
> - Rebased on latest tip/sched/core
> - Fix some typos
> - Fix patch 6 mess 
> 
> Vincent Guittot (6):
>   sched/fair: Filter false overloaded_group case for EAS
>   sched/fair: Update overutilized detection
>   sched/fair: Prepare select_task_rq_fair() to be called for new cases
>   sched/fair: Add push task mechanism for fair
>   sched/fair: Enable idle core tracking for !SMT
>   sched/fair: Add EAS and idle cpu push trigger
> 
>  kernel/sched/fair.c     | 350 +++++++++++++++++++++++++++++++++++-----
>  kernel/sched/sched.h    |  46 ++++--
>  kernel/sched/topology.c |   2 +
>  3 files changed, 345 insertions(+), 53 deletions(-)
>