[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210428144609.GB71893@e120877-lin.cambridge.arm.com>
Date: Wed, 28 Apr 2021 15:46:10 +0100
From: Vincent Donnefort <vincent.donnefort@....com>
To: Quentin Perret <qperret@...gle.com>
Cc: peterz@...radead.org, rjw@...ysocki.net, viresh.kumar@...aro.org,
vincent.guittot@...aro.org, linux-kernel@...r.kernel.org,
ionela.voinescu@....com, lukasz.luba@....com,
dietmar.eggemann@....com
Subject: Re: [PATCH] PM / EM: Inefficient OPPs detection
> >
> > On the Pixel4, I used rt-app to generate a task whom duty cycle is getting
> > higher for each phase. Then for each rt-app task placement, I measured how long
> > find_energy_efficient_cpu() took to run. I repeated the operation several
> > times to increase the count. Here's what I've got:
> >
> > ┌────────┬─────────────┬───────┬────────────────┬───────────────┬───────────────┐
> > │ Phase │ duty-cycle │ CPU │ w/o LUT │ w/ LUT │ │
> > │ │ │ ├────────┬───────┼───────┬───────┤ Diff │
> > │ │ │ │ Mean │ count │ Mean │ count │ │
> > ├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤
> > │ 0 │ 12.5% │ Little│ 10791 │ 3124 │ 10657 │ 3741 │ -1.2% -134ns │
> > ├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤
> > │ 1 │ 25% │ Mid │ 2924 │ 3097 │ 2894 │ 3740 │ -1% -30ns │
> > ├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤
> > │ 2 │ 37.5% │ Mid │ 2207 │ 3104 │ 2162 │ 3740 │ -2% -45ns │
> > ├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤
> > │ 3 │ 50% │ Mid │ 1897 │ 3119 │ 1864 │ 3717 │ -1.7% -33ns │
> > ├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤
> > │ │ │ Mid │ 1700 │ 396 │ 1609 │ 1232 │ -5.4% -91ns │
> > │ 4 │ 62.5% ├───────┼────────┼───────┼───────┼───────┼───────────────┤
> > │ │ │ Big │ 1187 │ 2729 │ 1129 │ 2518 │ -4.9% -58ns │
> > ├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤
> > │ 5 │ 75% │ Big │ 984 │ 3124 │ 900 │ 3693 │ -8.5% -84ns │
> > └────────┴─────────────┴───────┴────────┴───────┴───────┴───────┴───────────────┘
>
> Thanks for that. Do you have the stddev handy?
I do, it shows that the distribution is quite wide. I also have a 95% confidence
interval, as follow:
w/o LUT W/ LUT
Mean std Mean std
Phase0: 10791+/-79 2262 10657+/-71 2240 [1]
Phase1: 2924 +/-19 529 2894 +/-16 513 [1]
Phase2: 2207 +/-19 535 2162 +/-17 515
Phase3: 1897 +/-18 504 1864 +/-17 515 [1]
Phase4: Mid CPU 1700 +/-46 463 1609 +/-26 458
Phase4: Big CPU 1187 +/-15 407 1129 +/-15 385
Phase5: 987 +/-14 395 900 +/-12 365
[1] I included these results originally as the p-value for the test I used
showed we can reject confidently the null hypothesis that the 2 samples are
coming from the same distribution... However, the confidence intervals for
the mean overlaps. It is then complicated to conclude for those phases.
Interestingly it shows the distribution is slightly more narrow with the LUT. I
suppose due to the fact the LUT is less relying on caches than the original table
walk is.
>
> > Notice:
> >
> > * The CPU column describes which CPU ran the find_energy_efficient()
> > function.
> >
> > * I modified my patch so that no inefficient OPPs are reported. This is to
> > have a fairer comparison between the original table walk and the lookup
> > table.
>
> You mean to avoid the impact of the frequency selection itself? Maybe
> pinning the frequencies in the cpufreq policy could do?
Yes, it could have worked too, maybe it would have even been better, as it
would have removed the running frequency variations.
>
> >
> > * I removed from the table results that didn't have enough count to be
> > statistically significant.
>
>
> Anyways, this looks like a small but consistent gain throughout, so it's a
> win for the LUT :)
>
> Thanks,
> Quentin
Powered by blists - more mailing lists