linux-kernel - Re: [PATCH] PM / EM: Inefficient OPPs detection

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210422153644.GA316798@e124901.cambridge.arm.com>
Date:   Thu, 22 Apr 2021 16:36:44 +0100
From:   Vincent Donnefort <vincent.donnefort@....com>
To:     Quentin Perret <qperret@...gle.com>
Cc:     peterz@...radead.org, rjw@...ysocki.net, viresh.kumar@...aro.org,
        vincent.guittot@...aro.org, linux-kernel@...r.kernel.org,
        ionela.voinescu@....com, lukasz.luba@....com,
        dietmar.eggemann@....com
Subject: Re: [PATCH] PM / EM: Inefficient OPPs detection

> > As used in the hot-path, the efficient table is a lookup table, generated
> > dynamically when the perf domain is created. The complexity of searching
> > a performance state is hence changed from O(n) to O(1). This also
> > speeds-up em_cpu_energy() even if no inefficient OPPs have been found.
> 
> Interesting. Do you have measurements showing the benefits on wake-up
> duration? I remember doing so by hacking the wake-up path to force tasks
> into feec()/compute_energy() even when overutilized, and then running
> hackbench. Maybe something like that would work for you?
> 
> Just want to make sure we actually need all that complexity -- while
> it's good to reduce the asymptotic complexity, we're looking at a rather
> small problem (max 30 OPPs or so I expect?), so other effects may be
> dominating. Simply skipping inefficient OPPs could be implemented in a
> much simpler way I think.
> 
> Thanks,
> Quentin

On the Pixel4, I used rt-app to generate a task whom duty cycle is getting
higher for each phase. Then for each rt-app task placement, I measured how long
find_energy_efficient_cpu() took to run. I repeated the operation several
times to increase the count. Here's what I've got: 

┌────────┬─────────────┬───────┬────────────────┬───────────────┬───────────────┐
│ Phase  │ duty-cycle  │  CPU  │     w/o LUT    │    w/  LUT    │               │
│        │             │       ├────────┬───────┼───────┬───────┤      Diff     │
│        │             │       │ Mean   │ count │ Mean  │ count │               │
├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤
│   0    │    12.5%    │ Little│ 10791  │ 3124  │ 10657 │ 3741  │  -1.2% -134ns │
├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤
│   1    │    25%      │  Mid  │ 2924   │ 3097  │ 2894  │ 3740  │  -1%  -30ns   │
├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤
│   2    │    37.5%    │  Mid  │ 2207   │ 3104  │ 2162  │ 3740  │  -2%  -45ns   │
├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤
│   3    │    50%      │  Mid  │ 1897   │ 3119  │ 1864  │ 3717  │  -1.7% -33ns  │
├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤
│        │             │  Mid  │ 1700   │  396  │ 1609  │ 1232  │  -5.4% -91ns  │
│   4    │    62.5%    ├───────┼────────┼───────┼───────┼───────┼───────────────┤
│        │             │  Big  │ 1187   │ 2729  │ 1129  │ 2518  │  -4.9% -58ns  │
├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤
│   5    │    75%      │  Big  │  984   │ 3124  │  900  │ 3693  │  -8.5% -84ns  │
└────────┴─────────────┴───────┴────────┴───────┴───────┴───────┴───────────────┘

Notice:

  * The CPU column describes which CPU ran the find_energy_efficient()
    function.

  * I modified my patch so that no inefficient OPPs are reported. This is to
    have a fairer comparison between the original table walk and the lookup
    table.

  * I removed from the table results that didn't have enough count to be
    statistically significant.

-- 
Vincent.