linux-kernel - Re: [PATCH] sched/fair: remove util

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cb0ddea0-2c85-f0ec-f726-14a29cf51fad@arm.com>
Date:   Mon, 24 Jul 2023 23:11:38 +0200
From:   Dietmar Eggemann <dietmar.eggemann@....com>
To:     Vincent Guittot <vincent.guittot@...aro.org>
Cc:     Qais Yousef <qyousef@...alina.io>, mingo@...hat.com,
        peterz@...radead.org, juri.lelli@...hat.com, rostedt@...dmis.org,
        bsegall@...gle.com, mgorman@...e.de, bristot@...hat.com,
        vschneid@...hat.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] sched/fair: remove util_est boosting

On 24/07/2023 15:06, Vincent Guittot wrote:
> On Fri, 21 Jul 2023 at 18:09, Dietmar Eggemann <dietmar.eggemann@....com> wrote:
>>
>> On 12/07/2023 17:30, Vincent Guittot wrote:
>>> On Tue, 11 Jul 2023 at 17:47, Qais Yousef <qyousef@...alina.io> wrote:
>>>>
>>>> On 07/06/23 15:51, Vincent Guittot wrote:

[...]

>> -->
>>
>> I need the util_est = max(util_est, runnable) further down as well. Just
>> want to fetch runnable only once.
>>
>> util = 50, task_util = 5, util_est = 60, task_util_est = 10, runnable = 70
>>
>> max(70 + 5, 60 + 10) != max (70 + 5, 70 + 10) when dst_cpu == cpu
>>
> 
> Hmm, I don't get your point here. Why should they be equal ?
> 
> Below is a example to describe my problem:
> 
> task A with util_avg=200 util_est=300 runnable=200
> task A is attached to CPU0 so it contributes to CPU0's util_avg and
> runnable_avg.
> 
> In eenv_pd_max_util() we call cpu_util(cpu, p, dst_cpu, 1) to get the
> max utilization and the OPP to use to compute energy.
> 
> Let say that there is nothing else running on CPU0 and CPU1 and the
> both belong to the same performance domain so
> CPU0 util_avg= 200 util_est=0 runnable_avg=200
> CPU1 util_avg=0 util_est=0 runnable_avg=0
> 
> For CPU0, cpu_util(cpu, p, dst_cpu, 1) will return (200 + 300) = 500
> For CPU1, cpu_util(cpu, p, dst_cpu, 1) will return (0 + 300) = 300
> 
> If there is an OPP with a capacity between these 2 values, CPU1 will
> use a lower OPP than CPU0 and its computed energy will be lower.
> 
> The condition  if (max_spare_cap_cpu >= 0 && max_spare_cap >
> prev_spare_cap) filters some cases when CPU0 and CPU1 have the exact
> same spare capacity. But we often see a smaller spare capacity for
> CPU0 because of small side activities like cpufreq, timer, irq, rcu
> ... The difference is often only 1 but enough to bypass the condition
> above. task A will migrate to CPU1 whereas there is no need. Then it
> will move back to CPU0 once CPU1 will have a smaller spare capacity
> 
> I ran a test on snapdragon RB5 with the latest tip/sched/core. I start
> 3 tasks: 1 large enough to be on medium CPUs and 2 small enough to
> stay on little CPUs during 30 seconds
> With tip/sched/core, the 3 tasks are migrating around 3665
> With the patch, there is only 8 migration at the beginning of the test

I agree with this. The fact that cfs_rq->avg.runnable_avg contains
blocked contributions from task A makes it unsuitable for the util_est
(no blocked contributions) if condition (dst_cpu == cpu) since we don't
want to add A's util_est to util_est to simulate during wakeup that A is
enqueued.

>> <--
>>
>> But I assume your point is that:
>>
>> 7327       if (boost)
>> 7328           util_est = max(util_est, runnable);
>>
>> 7356       if (dst_cpu == cpu)                                   <-- (1)
>> 7357           util_est += _task_util_est(p);
>> 7358       else if (p && unlikely(task_on_rq_queued(p) || current == p))
>> 7359           lsub_positive(&util_est, _task_util_est(p));
>> 7360
>> 7361       util = max(util, util_est);
>>
>> --> (1) doesn't work anymore in case `util_est == runnable`.
>>
>> It will break the assumption for the if condition depicted in
>> cpu_util()'s comment:
> 
> exactly

OK.

>>
>> 7331  * During wake-up (2) @p isn't enqueued yet and doesn't contribute
>> 7332  * to any cpu_rq(cpu)->cfs.avg.util_est.enqueued.
>> 7333  * If @dst_cpu == @cpu add it to "simulate" cpu_util after @p
>> 7334  * has been enqueued.
>>
>> (2) eenv_pd_max_util() and find_energy_efficient_cpu() call-site.
>>
>> <---
>>
>> Rerunning Jankbench tests on Pix6 will tell if boosting util_avg instead
>> of both will still show the anticipated results. Likelihood is high that
>> it will since we do `util = max(util, util_est)` at the end of cpu_util().
> 
>  I think the same

Reran the Jankbench test with the patch (fix) on exactly the same
platform (Pixel6, Android 12) I used for v3 (base, runnable):

https://lkml.kernel.org/r/20230515115735.296329-1-dietmar.eggemann@arm.com

Max_frame_duration:
+-----------------+------------+
|     kernel      | value [ms] |
+-----------------+------------+
|      base       |   163.1    |
|    runnable     |   162.0    |
|       fix       |   157.1    |
+-----------------+------------+

Mean_frame_duration:
+-----------------+------------+----------+
|     kernel      | value [ms] | diff [%] |
+-----------------+------------+----------+
|      base       |    18.0    |    0.0   |
|    runnable     |    12.7    |  -29.43  |
|       fix       |    13.0    |  -27.78  |
+-----------------+------------+----------+

Jank percentage (Jank deadline 16ms):
+-----------------+------------+----------+
|     kernel      | value [%]  | diff [%] |
+-----------------+------------+----------+
|      base       |     3.6    |    0.0   |
|    runnable     |     1.0    |  -68.86  |
|       fix       |     1.0    |  -68.86  |
+-----------------+------------+----------+

Power usage [mW] (total - all CPUs):
+-----------------+------------+----------+
|     kernel      | value [mW] | diff [%] |
+-----------------+------------+----------+
|      base       |    129.5   |    0.0   |
|    runnable     |    134.3   |   3.71   |
|       fix       |    129.9   |   0.31   |
+-----------------+------------+----------+

Test results look good to me.

Reviewed-by: Dietmar Eggemann <dietmar.eggemann@....com>
Tested-by: Dietmar Eggemann <dietmar.eggemann@....com>

[...]