linux-kernel - Re: [PATCH v2] sched/fair: unlink misfit task from cpu overutilized

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtCmDA8WPrhFc8YxFXSOPOKasvvNWA3iOmRYcC2VSyMMrw@mail.gmail.com>
Date:   Fri, 13 Jan 2023 15:28:49 +0100
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Kajetan Puchalski <kajetan.puchalski@....com>
Cc:     mingo@...nel.org, peterz@...radead.org, dietmar.eggemann@....com,
        qyousef@...alina.io, rafael@...nel.org, viresh.kumar@...aro.org,
        vschneid@...hat.com, linux-pm@...r.kernel.org,
        linux-kernel@...r.kernel.org, lukasz.luba@....com, wvw@...gle.com,
        xuewen.yan94@...il.com, han.lin@...iatek.com,
        Jonathan.JMChen@...iatek.com
Subject: Re: [PATCH v2] sched/fair: unlink misfit task from cpu overutilized

Hi Kajetan,

On Fri, 13 Jan 2023 at 14:50, Kajetan Puchalski
<kajetan.puchalski@....com> wrote:
>
> Hi,
>
> > By taking into account uclamp_min, the 1:1 relation between task misfit
> > and cpu overutilized is no more true as a task with a small util_avg of
> > may not may not fit a high capacity cpu because of uclamp_min constraint.
> >
> > Add a new state in util_fits_cpu() to reflect the case that task would fit
> > a CPU except for the uclamp_min hint which is a performance requirement.
> >
> > Use -1 to reflect that a CPU doesn't fit only because of uclamp_min so we
> > can use this new value to take additional action to select the best CPU
> > that doesn't match uclamp_min hint.
>
> I just wanted to flag some issues I noticed with this patch and the
> entire topic.
>
> I was testing this on a Pixel 6 with a 5.18 android-mainline kernel with

Do you have more details to share on your setup ?
Android kernel has some hack on top of the mainline. Do you use some ?
Then, the perf and the power can be largely impacted by the cgroup
configuration. Have you got details on your setup ?

I'm going to try to reproduce the behavior

> all the relevant uclamp and CFS scheduling patches backported to it from
> mainline. From what I can see, the 'uclamp fits capacity' patchset
> introduced some alarming power usage & performance issues that this
> patch makes even worse.
>
> The patch stack for the following tables is as follows:
>
> (ufc_patched) sched/fair: unlink misfit task from cpu overutilized

I just sent a v3 which fixes a condition. Wonder if this could have an
impact on the results both perf and power

> sched/uclamp: Fix a uninitialized variable warnings
> (baseline_ufc) sched/fair: Check if prev_cpu has highest spare cap in feec()
> sched/uclamp: Cater for uclamp in find_energy_efficient_cpu()'s early exit condition
> sched/uclamp: Make cpu_overutilized() use util_fits_cpu()
> sched/uclamp: Make asym_fits_capacity() use util_fits_cpu()
> sched/uclamp: Make select_idle_capacity() use util_fits_cpu()
> sched/uclamp: Fix fits_capacity() check in feec()
> sched/uclamp: Make task_fits_capacity() use util_fits_cpu()
> sched/uclamp: Fix relationship between uclamp and migration margin
> (previous 'baseline' was here)
>
> I omitted the 3 patches relating directly to capacity_inversion but in
> the other tests I did with those there were similar issues. It's
> probably easier to consider the uclamp parts and their effects in
> isolation.
>
> 1. Geekbench 5 (performance regression)
>
> +-----------------+----------------------------+--------+-----------+
> |     metric      |           kernel           | value  | perc_diff |
> +-----------------+----------------------------+--------+-----------+
> | multicore_score |          baseline          | 2765.4 |   0.0%    |
> | multicore_score |        baseline_ufc        | 2704.3 |  -2.21%   | <-- a noticeable score decrease already
> | multicore_score |        ufc_patched         | 2443.2 |  -11.65%  | <-- a massive score decrease
> +-----------------+----------------------------+--------+-----------+
>
> +--------------+--------+----------------------------+--------+-----------+
> |  chan_name   | metric |           kernel           | value  | perc_diff |
> +--------------+--------+----------------------------+--------+-----------+
> | total_power  | gmean  |          baseline          | 2664.0 |   0.0%    |
> | total_power  | gmean  |        baseline_ufc        | 2621.5 |   -1.6%   | <-- worse performance per watt
> | total_power  | gmean  |        ufc_patched         | 2601.2 |  -2.36%   | <-- much worse performance per watt
> +--------------+--------+----------------------------+--------+-----------+
>
> The most likely cause for the regression seen above is the decrease in the amount of
> time spent while overutilized with these patches. Maximising
> overutilization for GB5 is the desired outcome as the benchmark for
> almost its entire duration keeps either 1 core or all the cores
> completely saturated so EAS cannot be effective. These patches have the
> opposite from the desired effect in this area.
>
> +----------------------------+--------------------+--------------------+------------+
> |          kernel            |        time        |     total_time     | percentage |
> +----------------------------+--------------------+--------------------+------------+
> |          baseline          |      121.979       |      181.065       |   67.46    |
> |        baseline_ufc        |      120.355       |      184.255       |   65.32    |
> |        ufc_patched         |       60.715       |      196.135       |   30.98    | <-- !!!
> +----------------------------+--------------------+--------------------+------------+

I'm not surprised because some use cases which were not overutilized
were wrongly triggered as overutilized so switching back to
performance mode. You might have to tune the uclamp value

>
> 2. Jankbench (power usage regression)
>
> +--------+---------------+---------------------------------+-------+-----------+
> | metric |   variable    |             kernel              | value | perc_diff |
> +--------+---------------+---------------------------------+-------+-----------+
> | gmean  | mean_duration |          baseline_60hz          | 14.6  |   0.0%    |
> | gmean  | mean_duration |        baseline_ufc_60hz        | 15.2  |   3.83%   |
> | gmean  | mean_duration |        ufc_patched_60hz         | 14.0  |  -4.12%   |
> +--------+---------------+---------------------------------+-------+-----------+
>
> +--------+-----------+---------------------------------+-------+-----------+
> | metric | variable  |             kernel              | value | perc_diff |
> +--------+-----------+---------------------------------+-------+-----------+
> | gmean  | jank_perc |          baseline_60hz          |  1.9  |   0.0%    |
> | gmean  | jank_perc |        baseline_ufc_60hz        |  2.2  |  15.39%   |
> | gmean  | jank_perc |        ufc_patched_60hz         |  2.0  |   3.61%   |
> +--------+-----------+---------------------------------+-------+-----------+
>
> +--------------+--------+---------------------------------+-------+-----------+
> |  chan_name   | metric |             kernel              | value | perc_diff |
> +--------------+--------+---------------------------------+-------+-----------+
> | total_power  | gmean  |          baseline_60hz          | 135.9 |   0.0%    |
> | total_power  | gmean  |        baseline_ufc_60hz        | 155.7 |  14.61%   | <-- !!!
> | total_power  | gmean  |        ufc_patched_60hz         | 157.1 |  15.63%   | <-- !!!
> +--------------+--------+---------------------------------+-------+-----------+
>
> With these patches while running Jankbench we use up ~15% more power
> just to achieve roughly the same results. Here I'm not sure where this
> issue is coming from exactly but all the results above are very consistent
> across different runs.
>
> > --
> > 2.17.1
> >
> >