linux-kernel - Re: [PATCH] PM / EM: Inefficient OPPs detection

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YILydL1QDxvuiFde@google.com>
Date:   Fri, 23 Apr 2021 16:14:44 +0000
From:   Quentin Perret <qperret@...gle.com>
To:     Vincent Donnefort <vincent.donnefort@....com>
Cc:     peterz@...radead.org, rjw@...ysocki.net, viresh.kumar@...aro.org,
        vincent.guittot@...aro.org, linux-kernel@...r.kernel.org,
        ionela.voinescu@....com, lukasz.luba@....com,
        dietmar.eggemann@....com
Subject: Re: [PATCH] PM / EM: Inefficient OPPs detection

On Thursday 22 Apr 2021 at 16:36:44 (+0100), Vincent Donnefort wrote:
> > > As used in the hot-path, the efficient table is a lookup table, generated
> > > dynamically when the perf domain is created. The complexity of searching
> > > a performance state is hence changed from O(n) to O(1). This also
> > > speeds-up em_cpu_energy() even if no inefficient OPPs have been found.
> > 
> > Interesting. Do you have measurements showing the benefits on wake-up
> > duration? I remember doing so by hacking the wake-up path to force tasks
> > into feec()/compute_energy() even when overutilized, and then running
> > hackbench. Maybe something like that would work for you?
> > 
> > Just want to make sure we actually need all that complexity -- while
> > it's good to reduce the asymptotic complexity, we're looking at a rather
> > small problem (max 30 OPPs or so I expect?), so other effects may be
> > dominating. Simply skipping inefficient OPPs could be implemented in a
> > much simpler way I think.
> > 
> > Thanks,
> > Quentin
> 
> On the Pixel4, I used rt-app to generate a task whom duty cycle is getting
> higher for each phase. Then for each rt-app task placement, I measured how long
> find_energy_efficient_cpu() took to run. I repeated the operation several
> times to increase the count. Here's what I've got: 
> 
> ┌────────┬─────────────┬───────┬────────────────┬───────────────┬───────────────┐
> │ Phase  │ duty-cycle  │  CPU  │     w/o LUT    │    w/  LUT    │               │
> │        │             │       ├────────┬───────┼───────┬───────┤      Diff     │
> │        │             │       │ Mean   │ count │ Mean  │ count │               │
> ├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤
> │   0    │    12.5%    │ Little│ 10791  │ 3124  │ 10657 │ 3741  │  -1.2% -134ns │
> ├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤
> │   1    │    25%      │  Mid  │ 2924   │ 3097  │ 2894  │ 3740  │  -1%  -30ns   │
> ├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤
> │   2    │    37.5%    │  Mid  │ 2207   │ 3104  │ 2162  │ 3740  │  -2%  -45ns   │
> ├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤
> │   3    │    50%      │  Mid  │ 1897   │ 3119  │ 1864  │ 3717  │  -1.7% -33ns  │
> ├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤
> │        │             │  Mid  │ 1700   │  396  │ 1609  │ 1232  │  -5.4% -91ns  │
> │   4    │    62.5%    ├───────┼────────┼───────┼───────┼───────┼───────────────┤
> │        │             │  Big  │ 1187   │ 2729  │ 1129  │ 2518  │  -4.9% -58ns  │
> ├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤
> │   5    │    75%      │  Big  │  984   │ 3124  │  900  │ 3693  │  -8.5% -84ns  │
> └────────┴─────────────┴───────┴────────┴───────┴───────┴───────┴───────────────┘

Thanks for that. Do you have the stddev handy?

> Notice:
> 
>   * The CPU column describes which CPU ran the find_energy_efficient()
>     function.
> 
>   * I modified my patch so that no inefficient OPPs are reported. This is to
>     have a fairer comparison between the original table walk and the lookup
>     table.

You mean to avoid the impact of the frequency selection itself? Maybe
pinning the frequencies in the cpufreq policy could do?

> 
>   * I removed from the table results that didn't have enough count to be
>     statistically significant.


Anyways, this looks like a small but consistent gain throughout, so it's a
win for the LUT :)

Thanks,
Quentin