linux-kernel - Re: [PATCH] PM / EM: Inefficient OPPs detection

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210428144609.GB71893@e120877-lin.cambridge.arm.com>
Date:   Wed, 28 Apr 2021 15:46:10 +0100
From:   Vincent Donnefort <vincent.donnefort@....com>
To:     Quentin Perret <qperret@...gle.com>
Cc:     peterz@...radead.org, rjw@...ysocki.net, viresh.kumar@...aro.org,
        vincent.guittot@...aro.org, linux-kernel@...r.kernel.org,
        ionela.voinescu@....com, lukasz.luba@....com,
        dietmar.eggemann@....com
Subject: Re: [PATCH] PM / EM: Inefficient OPPs detection

> > 
> > On the Pixel4, I used rt-app to generate a task whom duty cycle is getting
> > higher for each phase. Then for each rt-app task placement, I measured how long
> > find_energy_efficient_cpu() took to run. I repeated the operation several
> > times to increase the count. Here's what I've got: 
> > 
> > ┌────────┬─────────────┬───────┬────────────────┬───────────────┬───────────────┐
> > │ Phase  │ duty-cycle  │  CPU  │     w/o LUT    │    w/  LUT    │               │
> > │        │             │       ├────────┬───────┼───────┬───────┤      Diff     │
> > │        │             │       │ Mean   │ count │ Mean  │ count │               │
> > ├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤
> > │   0    │    12.5%    │ Little│ 10791  │ 3124  │ 10657 │ 3741  │  -1.2% -134ns │
> > ├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤
> > │   1    │    25%      │  Mid  │ 2924   │ 3097  │ 2894  │ 3740  │  -1%  -30ns   │
> > ├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤
> > │   2    │    37.5%    │  Mid  │ 2207   │ 3104  │ 2162  │ 3740  │  -2%  -45ns   │
> > ├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤
> > │   3    │    50%      │  Mid  │ 1897   │ 3119  │ 1864  │ 3717  │  -1.7% -33ns  │
> > ├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤
> > │        │             │  Mid  │ 1700   │  396  │ 1609  │ 1232  │  -5.4% -91ns  │
> > │   4    │    62.5%    ├───────┼────────┼───────┼───────┼───────┼───────────────┤
> > │        │             │  Big  │ 1187   │ 2729  │ 1129  │ 2518  │  -4.9% -58ns  │
> > ├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤
> > │   5    │    75%      │  Big  │  984   │ 3124  │  900  │ 3693  │  -8.5% -84ns  │
> > └────────┴─────────────┴───────┴────────┴───────┴───────┴───────┴───────────────┘
> 
> Thanks for that. Do you have the stddev handy?

I do, it shows that the distribution is quite wide. I also have a 95% confidence
interval, as follow:
                            w/o LUT               W/ LUT

	               Mean        std         Mean         std

Phase0:            10791+/-79      2262      10657+/-71     2240   [1]
Phase1:            2924 +/-19      529       2894 +/-16     513    [1]
Phase2:            2207 +/-19      535       2162 +/-17     515
Phase3:            1897 +/-18      504       1864 +/-17     515    [1]
Phase4:   Mid CPU  1700 +/-46      463       1609 +/-26     458
Phase4:   Big CPU  1187 +/-15      407       1129 +/-15     385
Phase5:            987  +/-14      395       900  +/-12     365 


[1] I included these results originally as the p-value for the test I used
showed we can reject confidently the null hypothesis that the 2 samples are
coming from the same distribution... However, the confidence intervals for
the mean overlaps. It is then complicated to conclude for those phases.

Interestingly it shows the distribution is slightly more narrow with the LUT. I
suppose due to the fact the LUT is less relying on caches than the original table
walk is.

> 
> > Notice:
> > 
> >   * The CPU column describes which CPU ran the find_energy_efficient()
> >     function.
> > 
> >   * I modified my patch so that no inefficient OPPs are reported. This is to
> >     have a fairer comparison between the original table walk and the lookup
> >     table.
> 
> You mean to avoid the impact of the frequency selection itself? Maybe
> pinning the frequencies in the cpufreq policy could do?

Yes, it could have worked too, maybe it would have even been better, as it
would have removed the running frequency variations.

> 
> > 
> >   * I removed from the table results that didn't have enough count to be
> >     statistically significant.
> 
> 
> Anyways, this looks like a small but consistent gain throughout, so it's a
> win for the LUT :)
> 
> Thanks,
> Quentin