[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtCmDA8WPrhFc8YxFXSOPOKasvvNWA3iOmRYcC2VSyMMrw@mail.gmail.com>
Date: Fri, 13 Jan 2023 15:28:49 +0100
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Kajetan Puchalski <kajetan.puchalski@....com>
Cc: mingo@...nel.org, peterz@...radead.org, dietmar.eggemann@....com,
qyousef@...alina.io, rafael@...nel.org, viresh.kumar@...aro.org,
vschneid@...hat.com, linux-pm@...r.kernel.org,
linux-kernel@...r.kernel.org, lukasz.luba@....com, wvw@...gle.com,
xuewen.yan94@...il.com, han.lin@...iatek.com,
Jonathan.JMChen@...iatek.com
Subject: Re: [PATCH v2] sched/fair: unlink misfit task from cpu overutilized
Hi Kajetan,
On Fri, 13 Jan 2023 at 14:50, Kajetan Puchalski
<kajetan.puchalski@....com> wrote:
>
> Hi,
>
> > By taking into account uclamp_min, the 1:1 relation between task misfit
> > and cpu overutilized is no more true as a task with a small util_avg of
> > may not may not fit a high capacity cpu because of uclamp_min constraint.
> >
> > Add a new state in util_fits_cpu() to reflect the case that task would fit
> > a CPU except for the uclamp_min hint which is a performance requirement.
> >
> > Use -1 to reflect that a CPU doesn't fit only because of uclamp_min so we
> > can use this new value to take additional action to select the best CPU
> > that doesn't match uclamp_min hint.
>
> I just wanted to flag some issues I noticed with this patch and the
> entire topic.
>
> I was testing this on a Pixel 6 with a 5.18 android-mainline kernel with
Do you have more details to share on your setup ?
Android kernel has some hack on top of the mainline. Do you use some ?
Then, the perf and the power can be largely impacted by the cgroup
configuration. Have you got details on your setup ?
I'm going to try to reproduce the behavior
> all the relevant uclamp and CFS scheduling patches backported to it from
> mainline. From what I can see, the 'uclamp fits capacity' patchset
> introduced some alarming power usage & performance issues that this
> patch makes even worse.
>
> The patch stack for the following tables is as follows:
>
> (ufc_patched) sched/fair: unlink misfit task from cpu overutilized
I just sent a v3 which fixes a condition. Wonder if this could have an
impact on the results both perf and power
> sched/uclamp: Fix a uninitialized variable warnings
> (baseline_ufc) sched/fair: Check if prev_cpu has highest spare cap in feec()
> sched/uclamp: Cater for uclamp in find_energy_efficient_cpu()'s early exit condition
> sched/uclamp: Make cpu_overutilized() use util_fits_cpu()
> sched/uclamp: Make asym_fits_capacity() use util_fits_cpu()
> sched/uclamp: Make select_idle_capacity() use util_fits_cpu()
> sched/uclamp: Fix fits_capacity() check in feec()
> sched/uclamp: Make task_fits_capacity() use util_fits_cpu()
> sched/uclamp: Fix relationship between uclamp and migration margin
> (previous 'baseline' was here)
>
> I omitted the 3 patches relating directly to capacity_inversion but in
> the other tests I did with those there were similar issues. It's
> probably easier to consider the uclamp parts and their effects in
> isolation.
>
> 1. Geekbench 5 (performance regression)
>
> +-----------------+----------------------------+--------+-----------+
> | metric | kernel | value | perc_diff |
> +-----------------+----------------------------+--------+-----------+
> | multicore_score | baseline | 2765.4 | 0.0% |
> | multicore_score | baseline_ufc | 2704.3 | -2.21% | <-- a noticeable score decrease already
> | multicore_score | ufc_patched | 2443.2 | -11.65% | <-- a massive score decrease
> +-----------------+----------------------------+--------+-----------+
>
> +--------------+--------+----------------------------+--------+-----------+
> | chan_name | metric | kernel | value | perc_diff |
> +--------------+--------+----------------------------+--------+-----------+
> | total_power | gmean | baseline | 2664.0 | 0.0% |
> | total_power | gmean | baseline_ufc | 2621.5 | -1.6% | <-- worse performance per watt
> | total_power | gmean | ufc_patched | 2601.2 | -2.36% | <-- much worse performance per watt
> +--------------+--------+----------------------------+--------+-----------+
>
> The most likely cause for the regression seen above is the decrease in the amount of
> time spent while overutilized with these patches. Maximising
> overutilization for GB5 is the desired outcome as the benchmark for
> almost its entire duration keeps either 1 core or all the cores
> completely saturated so EAS cannot be effective. These patches have the
> opposite from the desired effect in this area.
>
> +----------------------------+--------------------+--------------------+------------+
> | kernel | time | total_time | percentage |
> +----------------------------+--------------------+--------------------+------------+
> | baseline | 121.979 | 181.065 | 67.46 |
> | baseline_ufc | 120.355 | 184.255 | 65.32 |
> | ufc_patched | 60.715 | 196.135 | 30.98 | <-- !!!
> +----------------------------+--------------------+--------------------+------------+
I'm not surprised because some use cases which were not overutilized
were wrongly triggered as overutilized so switching back to
performance mode. You might have to tune the uclamp value
>
> 2. Jankbench (power usage regression)
>
> +--------+---------------+---------------------------------+-------+-----------+
> | metric | variable | kernel | value | perc_diff |
> +--------+---------------+---------------------------------+-------+-----------+
> | gmean | mean_duration | baseline_60hz | 14.6 | 0.0% |
> | gmean | mean_duration | baseline_ufc_60hz | 15.2 | 3.83% |
> | gmean | mean_duration | ufc_patched_60hz | 14.0 | -4.12% |
> +--------+---------------+---------------------------------+-------+-----------+
>
> +--------+-----------+---------------------------------+-------+-----------+
> | metric | variable | kernel | value | perc_diff |
> +--------+-----------+---------------------------------+-------+-----------+
> | gmean | jank_perc | baseline_60hz | 1.9 | 0.0% |
> | gmean | jank_perc | baseline_ufc_60hz | 2.2 | 15.39% |
> | gmean | jank_perc | ufc_patched_60hz | 2.0 | 3.61% |
> +--------+-----------+---------------------------------+-------+-----------+
>
> +--------------+--------+---------------------------------+-------+-----------+
> | chan_name | metric | kernel | value | perc_diff |
> +--------------+--------+---------------------------------+-------+-----------+
> | total_power | gmean | baseline_60hz | 135.9 | 0.0% |
> | total_power | gmean | baseline_ufc_60hz | 155.7 | 14.61% | <-- !!!
> | total_power | gmean | ufc_patched_60hz | 157.1 | 15.63% | <-- !!!
> +--------------+--------+---------------------------------+-------+-----------+
>
> With these patches while running Jankbench we use up ~15% more power
> just to achieve roughly the same results. Here I'm not sure where this
> issue is coming from exactly but all the results above are very consistent
> across different runs.
>
> > --
> > 2.17.1
> >
> >
Powered by blists - more mailing lists