linux-kernel - Re: [RFC PATCH 0/1] sched/pelt: Change PELT halflife at runtime

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAGXk5yoC+whmLQn-KvUE3_rGGj4jodsKushr5LmtPK0mi6DFEQ@mail.gmail.com>
Date:   Mon, 3 Oct 2022 15:57:02 -0700
From:   Wei Wang <wvw@...gle.com>
To:     Kajetan Puchalski <kajetan.puchalski@....com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Jian-Min Liu <jian-min.liu@...iatek.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Ingo Molnar <mingo@...nel.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Morten Rasmussen <morten.rasmussen@....com>,
        Vincent Donnefort <vdonnefort@...gle.com>,
        Quentin Perret <qperret@...gle.com>,
        Patrick Bellasi <patrick.bellasi@...bug.net>,
        Abhijeet Dharmapurikar <adharmap@...cinc.com>,
        Qais Yousef <qais.yousef@....com>,
        linux-kernel@...r.kernel.org,
        Jonathan JMChen <jonathan.jmchen@...iatek.com>,
        "Chung-Kai (Michael) Mei" <chungkai@...gle.com>
Subject: Re: [RFC PATCH 0/1] sched/pelt: Change PELT halflife at runtime

We have some data on an earlier build of Pixel 6a, which also runs a
slightly modified "sched" governor. The tuning definitely has both
performance and power impact on UX. With some additional user space
hints such as ADPF (Android Dynamic Performance Framework) and/or the
old-fashioned INTERACTION power hint, different trade-offs can be
archived with this sort of tuning.


+---------------------------------------------------------+----------+----------+
|                         Metrics                         |   32ms   |
  8ms    |
+---------------------------------------------------------+----------+----------+
| Sum of gfxinfo_com.android.test.uibench_deadline_missed |   185.00 |
  112.00 |
| Sum of SFSTATS_GLOBAL_MISSEDFRAMES                      |    62.00 |
   49.00 |
| CPU Power                                               | 6,204.00 |
7,040.00 |
| Sum of Gfxinfo.frame.95th                               |   582.00 |
  506.00 |
| Avg of Gfxinfo.frame.95th                               |    18.19 |
   15.81 |
+---------------------------------------------------------+----------+----------+





On Thu, Sep 29, 2022 at 11:59 PM Kajetan Puchalski
<kajetan.puchalski@....com> wrote:
>
> On Thu, Sep 29, 2022 at 01:21:45PM +0200, Peter Zijlstra wrote:
> > On Thu, Sep 29, 2022 at 12:10:17PM +0100, Kajetan Puchalski wrote:
> >
> > > Overall, the problem being solved here is that based on our testing the
> > > PELT half life can occasionally be too slow to keep up in scenarios
> > > where many frames need to be rendered quickly, especially on high-refresh
> > > rate phones and similar devices.
> >
> > But it is a problem of DVFS not ramping up quick enough; or of the
> > load-balancer not reacting to the increase in load, or what aspect
> > controlled by PELT is responsible for the improvement seen?
>
> Based on all the tests we've seen, jankbench or otherwise, the
> improvement can mainly be attributed to the faster ramp up of frequency
> caused by the shorter PELT window while using schedutil. Alongside that
> the signals rising faster also mean that the task would get migrated
> faster to bigger CPUs on big.LITTLE systems which improves things too
> but it's mostly the frequency aspect of it.
>
> To establish that this benchmark is sensitive to frequency I ran some
> tests using the 'performance' cpufreq governor.
>
> Max frame duration (ms)
>
> +------------------+-------------+----------+
> | kernel           |   iteration |    value |
> |------------------+-------------+----------|
> | pelt_1           |          10 | 157.426  |
> | pelt_4           |          10 |  85.2713 |
> | performance      |          10 |  40.9308 |
> +------------------+-------------+----------+
>
> Mean frame duration (ms)
>
> +---------------+------------------+---------+-------------+
> | variable      | kernel           |   value | perc_diff   |
> |---------------+------------------+---------+-------------|
> | mean_duration | pelt_1           |    14.6 | 0.0%        |
> | mean_duration | pelt_4           |    14.5 | -0.58%      |
> | mean_duration | performance      |     4.4 | -69.75%     |
> +---------------+------------------+---------+-------------+
>
> Jank percentage
>
> +------------+------------------+---------+-------------+
> | variable   | kernel           |   value | perc_diff   |
> |------------+------------------+---------+-------------|
> | jank_perc  | pelt_1           |     2.1 | 0.0%        |
> | jank_perc  | pelt_4           |     2   | -3.46%      |
> | jank_perc  | performance      |     0.1 | -97.25%     |
> +------------+------------------+---------+-------------+
>
> As you can see, bumping up frequency can hugely improve the results
> here. This is what's happening when we decrease the PELT window, just on
> a much smaller and not as drastic scale. It also explains specifically
> where the increased power usage is coming from.