[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <871r29tvdj.fsf@riseup.net>
Date: Sat, 18 Dec 2021 14:12:56 -0800
From: Francisco Jerez <currojerez@...eup.net>
To: Julia Lawall <julia.lawall@...ia.fr>
Cc: "Rafael J. Wysocki" <rafael@...nel.org>,
Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
Len Brown <lenb@...nel.org>,
Viresh Kumar <viresh.kumar@...aro.org>,
Linux PM <linux-pm@...r.kernel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>
Subject: Re: cpufreq: intel_pstate: map utilization into the pstate range
Julia Lawall <julia.lawall@...ia.fr> writes:
> On Sat, 18 Dec 2021, Francisco Jerez wrote:
>
>> Julia Lawall <julia.lawall@...ia.fr> writes:
>>
>> >> As you can see in intel_pstate.c, min_pstate is initialized on core
>> >> platforms from MSR_PLATFORM_INFO[47:40], which is "Maximum Efficiency
>> >> Ratio (R/O)". However that seems to deviate massively from the most
>> >> efficient ratio on your system, which may indicate a firmware bug, some
>> >> sort of clock gating problem, or an issue with the way that
>> >> intel_pstate.c processes this information.
>> >
>> > I'm not sure to understand the bug part. min_pstate gives the frequency
>> > that I find as the minimum frequency when I look for the specifications of
>> > the CPU. Should one expect that it should be something different?
>> >
>>
>> I'd expect the minimum frequency on your processor specification to
>> roughly match the "Maximum Efficiency Ratio (R/O)" value from that MSR,
>> since there's little reason to claim your processor can be clocked down
>> to a frequency which is inherently inefficient /and/ slower than the
>> maximum efficiency ratio -- In fact they both seem to match in your
>> system, they're just nowhere close to the frequency which is actually
>> most efficient, which smells like a bug, like your processor
>> misreporting what the most efficient frequency is, or it deviating from
>> the expected one due to your CPU static power consumption being greater
>> than it would be expected to be under ideal conditions -- E.g. due to
>> some sort of clock gating issue, possibly due to a software bug, or due
>> to our scheduling of such workloads with a large amount of lightly
>> loaded threads being unnecessarily inefficient which could also be
>> preventing most of your CPU cores from ever being clock-gated even
>> though your processor may be sitting idle for a large fraction of their
>> runtime.
>
> The original mail has results from two different machines: Intel 6130
> (skylake) and Intel 5218 (cascade lake). I have access to another cluster
> of 6130s and 5218s. I can try them.
>
> I tried 5.9 in which I just commented out the schedutil code to make
> frequency requests. I only tested avrora (tiny pauses) and h2 (longer
> pauses) and in both case the execution is almost entirely in the turbo
> frequencies.
>
> I'm not sure to understand the term "clock-gated". What C state does that
> correspond to? The turbostat output for one run of avrora is below.
>
I didn't have any specific C1+ state in mind, most of the deeper ones
implement some sort of clock gating among other optimizations, I was
just wondering whether some sort of software bug and/or the highly
intermittent CPU utilization pattern of these workloads are preventing
most of your CPU cores from entering deep sleep states. See below.
> julia
>
> 78.062895 sec
> Package Core CPU Avg_MHz Busy% Bzy_MHz TSC_MHz IRQ SMI POLL C1 C1E C6 POLL% C1% C1E% C6% CPU%c1 CPU%c6 CoreTmp PkgTmp Pkg%pc2 Pkg%pc6 Pkg_J RAM_J PKG_% RAM_%
> - - - 31 2.95 1065 2096 156134 0 1971 155458 2956270 657130 0.00 0.20 4.78 92.26 14.75 82.31 40 41 45.14 0.04 4747.52 2509.05 0.00 0.00
> 0 0 0 13 1.15 1132 2095 11360 0 0 2 39 19209 0.00 0.00 0.01 99.01 8.02 90.83 39 41 90.24 0.04 2266.04 1346.09 0.00 0.00
This seems suspicious: ^^^^ ^^^^^^^
I hadn't understood that you're running this on a dual-socket system
until I looked at these results. It seems like package #0 is doing
pretty much nothing according to the stats below, but it's still
consuming nearly half of your energy, apparently because the idle
package #0 isn't entering deep sleep states (Pkg%pc6 above is close to
0%). That could explain your unexpectedly high static power consumption
and the deviation of the real maximum efficiency frequency from the one
reported by your processor, since the reported maximum efficiency ratio
cannot possibly take into account the existence of a second CPU package
with dysfunctional idle management.
I'm guessing that if you fully disable one of your CPU packages and
repeat the previous experiment forcing various P-states between 10 and
37 you should get a maximum efficiency ratio closer to the theoretical
one for this CPU?
> 0 0 32 1 0.09 1001 2095 37 0 0 0 0 42 0.00 0.00 0.00 100.00 9.08
> 0 1 4 0 0.04 1000 2095 57 0 0 0 1 133 0.00 0.00 0.00 99.96 0.08 99.88 38
> 0 1 36 0 0.00 1000 2095 35 0 0 0 0 40 0.00 0.00 0.00 100.00 0.12
> 0 2 8 0 0.03 1000 2095 64 0 0 0 1 124 0.00 0.00 0.00 99.97 0.08 99.89 38
> 0 2 40 0 0.00 1000 2095 36 0 0 0 0 40 0.00 0.00 0.00 100.00 0.10
> 0 3 12 0 0.00 1000 2095 42 0 0 0 0 71 0.00 0.00 0.00 100.00 0.14 99.86 38
> 0 3 44 1 0.09 1000 2095 63 0 0 0 0 65 0.00 0.00 0.00 99.91 0.05
> 0 4 14 0 0.00 1010 2095 38 0 0 0 1 41 0.00 0.00 0.00 100.00 0.04 99.96 39
> 0 4 46 0 0.00 1011 2095 36 0 0 0 1 41 0.00 0.00 0.00 100.00 0.04
> 0 5 10 0 0.01 1084 2095 39 0 0 0 0 58 0.00 0.00 0.00 99.99 0.04 99.95 38
> 0 5 42 0 0.00 1114 2095 35 0 0 0 0 39 0.00 0.00 0.00 100.00 0.05
> 0 6 6 0 0.03 1005 2095 89 0 0 0 1 116 0.00 0.00 0.00 99.97 0.07 99.90 39
> 0 6 38 0 0.00 1000 2095 38 0 0 0 0 41 0.00 0.00 0.00 100.00 0.10
> 0 7 2 0 0.05 1001 2095 59 0 0 0 1 133 0.00 0.00 0.00 99.95 0.09 99.86 40
> 0 7 34 0 0.00 1000 2095 39 0 0 0 0 65 0.00 0.00 0.00 100.00 0.13
> 0 8 16 0 0.00 1000 2095 43 0 0 0 0 47 0.00 0.00 0.00 100.00 0.04 99.96 38
> 0 8 48 0 0.00 1000 2095 37 0 0 0 0 41 0.00 0.00 0.00 100.00 0.04
> 0 9 20 0 0.00 1000 2095 33 0 0 0 0 37 0.00 0.00 0.00 100.00 0.03 99.97 38
> 0 9 52 0 0.00 1000 2095 33 0 0 0 0 36 0.00 0.00 0.00 100.00 0.03
> 0 10 24 0 0.00 1000 2095 36 0 0 0 1 40 0.00 0.00 0.00 100.00 0.03 99.96 39
> 0 10 56 0 0.00 1000 2095 37 0 0 0 1 38 0.00 0.00 0.00 100.00 0.03
> 0 11 28 0 0.00 1002 2095 35 0 0 0 1 37 0.00 0.00 0.00 100.00 0.03 99.97 38
> 0 11 60 0 0.00 1004 2095 34 0 0 0 0 36 0.00 0.00 0.00 100.00 0.03
> 0 12 30 0 0.00 1001 2095 35 0 0 0 0 40 0.00 0.00 0.00 100.00 0.11 99.88 38
> 0 12 62 0 0.01 1000 2095 197 0 0 0 0 197 0.00 0.00 0.00 99.99 0.10
> 0 13 26 0 0.00 1000 2095 37 0 0 0 0 41 0.00 0.00 0.00 100.00 0.03 99.97 39
> 0 13 58 0 0.00 1000 2095 38 0 0 0 0 40 0.00 0.00 0.00 100.00 0.03
> 0 14 22 0 0.01 1000 2095 149 0 1 2 0 142 0.00 0.01 0.00 99.99 0.07 99.92 39
> 0 14 54 0 0.00 1000 2095 35 0 0 0 0 38 0.00 0.00 0.00 100.00 0.07
> 0 15 18 0 0.00 1000 2095 33 0 0 0 0 36 0.00 0.00 0.00 100.00 0.03 99.97 39
> 0 15 50 0 0.00 1000 2095 34 0 0 0 0 38 0.00 0.00 0.00 100.00 0.03
> 1 0 1 32 3.23 1008 2095 2385 0 31 3190 45025 10144 0.00 0.28 4.68 91.99 11.21 85.56 32 35 0.04 0.04 2481.49 1162.96 0.00 0.00
> 1 0 33 9 0.63 1404 2095 12206 0 5 162 2480 10283 0.00 0.04 0.75 98.64 13.81
> 1 1 5 1 0.07 1384 2095 236 0 0 38 24 314 0.00 0.09 0.06 99.77 4.66 95.27 33
> 1 1 37 81 3.93 2060 2095 1254 0 5 40 59 683 0.00 0.01 0.02 96.05 0.80
> 1 2 9 37 3.46 1067 2095 2396 0 29 2256 55406 11731 0.00 0.17 6.02 90.54 54.10 42.44 31
> 1 2 41 151 14.51 1042 2095 10447 0 135 10494 248077 42327 0.01 0.87 26.57 58.84 43.05
> 1 3 13 110 10.47 1053 2095 7120 0 120 9218 168938 33884 0.01 0.77 16.63 72.68 42.58 46.95 32
> 1 3 45 69 6.76 1021 2095 4730 0 66 5598 115410 23447 0.00 0.44 12.06 81.12 46.29
> 1 4 15 112 10.64 1056 2095 7204 0 116 8831 171423 37754 0.01 0.70 17.56 71.67 28.01 61.35 33
> 1 4 47 18 1.80 1006 2095 1771 0 13 915 29315 6564 0.00 0.07 3.20 95.03 36.85
> 1 5 11 63 5.96 1065 2095 4090 0 58 6449 99015 18955 0.00 0.45 10.27 83.64 31.24 62.80 31
> 1 5 43 72 7.11 1016 2095 4794 0 73 6203 115361 26494 0.00 0.48 11.79 81.02 30.09
> 1 6 7 35 3.39 1022 2095 2328 0 45 3377 52721 13759 0.00 0.27 5.10 91.43 25.84 70.77 32
> 1 6 39 67 6.09 1096 2095 4483 0 52 3696 94964 19366 0.00 0.30 10.32 83.61 23.14
> 1 7 3 1 0.06 1395 2095 91 0 0 0 1 167 0.00 0.00 0.00 99.95 25.36 74.58 35
> 1 7 35 83 8.16 1024 2095 5785 0 100 7398 134640 27428 0.00 0.56 13.39 78.34 17.26
> 1 8 17 46 4.49 1016 2095 3229 0 52 3048 74914 16010 0.00 0.27 8.29 87.19 29.71 65.80 33
> 1 8 49 64 6.12 1052 2095 4210 0 89 5782 100570 21463 0.00 0.42 10.63 83.17 28.08
> 1 9 21 73 7.02 1036 2095 4917 0 64 5786 109887 21939 0.00 0.55 11.61 81.18 22.10 70.88 33
> 1 9 53 64 6.33 1012 2095 4074 0 69 5957 97596 20580 0.00 0.51 9.78 83.74 22.79
> 1 10 25 26 2.58 1013 2095 1825 0 22 2124 42630 8627 0.00 0.17 4.17 93.24 53.91 43.52 33
> 1 10 57 159 15.59 1022 2095 10951 0 175 14237 256828 56810 0.01 1.10 26.00 58.16 40.89
> 1 11 29 112 10.54 1065 2095 7462 0 126 9548 179206 39821 0.01 0.85 18.49 70.71 29.46 60.00 31
> 1 11 61 29 2.89 1011 2095 2002 0 24 2468 45558 10288 0.00 0.20 4.71 92.36 37.11
> 1 12 31 37 3.66 1011 2095 2596 0 79 3161 61027 13292 0.00 0.24 6.48 89.79 23.75 72.59 32
> 1 12 63 56 5.08 1107 2095 3789 0 62 4777 79133 17089 0.00 0.41 7.91 86.86 22.31
> 1 13 27 12 1.14 1045 2095 1477 0 16 888 18744 3250 0.00 0.06 2.18 96.70 21.23 77.64 32
> 1 13 59 60 5.81 1038 2095 5230 0 60 4936 87225 21402 0.00 0.41 8.95 85.14 16.55
> 1 14 23 28 2.75 1024 2095 2008 0 20 1839 47417 9177 0.00 0.13 5.08 92.21 34.18 63.07 32
> 1 14 55 106 9.58 1105 2095 6292 0 89 7182 141379 31354 0.00 0.63 14.45 75.81 27.36
> 1 15 19 118 11.65 1012 2095 7872 0 121 10014 193186 40448 0.01 0.80 19.53 68.68 37.53 50.82 32
> 1 15 51 59 5.58 1059 2095 3967 0 54 5842 88063 21138 0.00 0.39 9.12 85.23 43.60
Powered by blists - more mailing lists