linux-kernel - Re: [RFC PATCH 0/1] sched/pelt: Change PELT halflife at runtime

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGXk5yo4YOBQkt3DmWkipJgWqU5+00Ahsw_BaFJnwigR2iRmgA@mail.gmail.com>
Date:   Wed, 5 Oct 2022 09:57:17 -0700
From:   Wei Wang <wvw@...gle.com>
To:     Dietmar Eggemann <dietmar.eggemann@....com>
Cc:     Kajetan Puchalski <kajetan.puchalski@....com>,
        Peter Zijlstra <peterz@...radead.org>,
        Jian-Min Liu <jian-min.liu@...iatek.com>,
        Ingo Molnar <mingo@...nel.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Morten Rasmussen <morten.rasmussen@....com>,
        Vincent Donnefort <vdonnefort@...gle.com>,
        Quentin Perret <qperret@...gle.com>,
        Patrick Bellasi <patrick.bellasi@...bug.net>,
        Abhijeet Dharmapurikar <adharmap@...cinc.com>,
        Qais Yousef <qais.yousef@....com>,
        linux-kernel@...r.kernel.org,
        Jonathan JMChen <jonathan.jmchen@...iatek.com>,
        "Chung-Kai (Michael) Mei" <chungkai@...gle.com>
Subject: Re: [RFC PATCH 0/1] sched/pelt: Change PELT halflife at runtime

On Tue, Oct 4, 2022 at 2:33 AM Dietmar Eggemann
<dietmar.eggemann@....com> wrote:
>
> Hi Wei,
>
> On 04/10/2022 00:57, Wei Wang wrote:
>
> Please don't do top-posting.
>

Sorry, forgot this was posted to the list...

> > We have some data on an earlier build of Pixel 6a, which also runs a
> > slightly modified "sched" governor. The tuning definitely has both
> > performance and power impact on UX. With some additional user space
> > hints such as ADPF (Android Dynamic Performance Framework) and/or the
> > old-fashioned INTERACTION power hint, different trade-offs can be
> > archived with this sort of tuning.
> >
> >
> > +---------------------------------------------------------+----------+----------+
> > |                         Metrics                         |   32ms   |
> >   8ms    |
> > +---------------------------------------------------------+----------+----------+
> > | Sum of gfxinfo_com.android.test.uibench_deadline_missed |   185.00 |
> >   112.00 |
> > | Sum of SFSTATS_GLOBAL_MISSEDFRAMES                      |    62.00 |
> >    49.00 |
> > | CPU Power                                               | 6,204.00 |
> > 7,040.00 |
> > | Sum of Gfxinfo.frame.95th                               |   582.00 |
> >   506.00 |
> > | Avg of Gfxinfo.frame.95th                               |    18.19 |
> >    15.81 |
> > +---------------------------------------------------------+----------+----------+
>
> Which App is package `gfxinfo_com.android.test`? Is this UIBench? Never
> ran it.
>

Yes.

> I'm familiar with `dumpsys gfxinfo <PACKAGE_NAME>`.
>
> # adb shell dumpsys gfxinfo <PACKAGE_NAME>
>
> ...
> ** Graphics info for pid XXXX [<PACKAGE_NAME>] **
> ...
> 95th percentile: XXms            <-- (a)
> ...
> Number Frame deadline missed: XX <-- (b)
> ...
>
>
> I assume that `Gfxinfo.frame.95th` is related to (a) and
> `gfxinfo_com.android.test.uibench_deadline_missed` to (b)? Not sure
> where `SFSTATS_GLOBAL_MISSEDFRAMES` is coming from?
>

a) is correct b) is from surfaceflinger. Android display pipeline
involves both a) app (generation) and b) surfaceflinger
(presentation).

> What's the Sum here? Is it that you ran the test 32 times (582/18.19 = 32)?
>

Uibench[1] has several micro tests and it is the sum of those tests.


[1]: https://cs.android.com/android/platform/superproject/+/master:platform_testing/tests/microbenchmarks/uibench/src/com/android/uibench/microbenchmark/


> [...]
>
> > On Thu, Sep 29, 2022 at 11:59 PM Kajetan Puchalski
> > <kajetan.puchalski@....com> wrote:
> >>
> >> On Thu, Sep 29, 2022 at 01:21:45PM +0200, Peter Zijlstra wrote:
> >>> On Thu, Sep 29, 2022 at 12:10:17PM +0100, Kajetan Puchalski wrote:
> >>>
> >>>> Overall, the problem being solved here is that based on our testing the
> >>>> PELT half life can occasionally be too slow to keep up in scenarios
> >>>> where many frames need to be rendered quickly, especially on high-refresh
> >>>> rate phones and similar devices.
> >>>
> >>> But it is a problem of DVFS not ramping up quick enough; or of the
> >>> load-balancer not reacting to the increase in load, or what aspect
> >>> controlled by PELT is responsible for the improvement seen?
> >>
> >> Based on all the tests we've seen, jankbench or otherwise, the
> >> improvement can mainly be attributed to the faster ramp up of frequency
> >> caused by the shorter PELT window while using schedutil. Alongside that
> >> the signals rising faster also mean that the task would get migrated
> >> faster to bigger CPUs on big.LITTLE systems which improves things too
> >> but it's mostly the frequency aspect of it.
> >>
> >> To establish that this benchmark is sensitive to frequency I ran some
> >> tests using the 'performance' cpufreq governor.
> >>
> >> Max frame duration (ms)
> >>
> >> +------------------+-------------+----------+
> >> | kernel           |   iteration |    value |
> >> |------------------+-------------+----------|
> >> | pelt_1           |          10 | 157.426  |
> >> | pelt_4           |          10 |  85.2713 |
> >> | performance      |          10 |  40.9308 |
> >> +------------------+-------------+----------+
> >>
> >> Mean frame duration (ms)
> >>
> >> +---------------+------------------+---------+-------------+
> >> | variable      | kernel           |   value | perc_diff   |
> >> |---------------+------------------+---------+-------------|
> >> | mean_duration | pelt_1           |    14.6 | 0.0%        |
> >> | mean_duration | pelt_4           |    14.5 | -0.58%      |
> >> | mean_duration | performance      |     4.4 | -69.75%     |
> >> +---------------+------------------+---------+-------------+
> >>
> >> Jank percentage
> >>
> >> +------------+------------------+---------+-------------+
> >> | variable   | kernel           |   value | perc_diff   |
> >> |------------+------------------+---------+-------------|
> >> | jank_perc  | pelt_1           |     2.1 | 0.0%        |
> >> | jank_perc  | pelt_4           |     2   | -3.46%      |
> >> | jank_perc  | performance      |     0.1 | -97.25%     |
> >> +------------+------------------+---------+-------------+
> >>
> >> As you can see, bumping up frequency can hugely improve the results
> >> here. This is what's happening when we decrease the PELT window, just on
> >> a much smaller and not as drastic scale. It also explains specifically
> >> where the increased power usage is coming from.
>