lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <87d1ywxwgm.fsf@smart-cactus.org>
Date:	Sun, 09 Aug 2015 18:23:05 +0200
From:	Ben Gamari <ben@...rt-cactus.org>
To:	LKML <linux-kernel@...r.kernel.org>
Subject: intel_pstate throttling stuck at low frequency


Hello all,

I have a Dell Latitude E7440 running Ubuntu 15.04 which seems to be
suffering from the intel_pstate driver getting stuck in a throttled
state while under load. The issue typically occurs on warm days when the
while the machine is under load for an extended period of time (e.g.
while compiling).

Under these conditions performance gradually deteriorates as the CPU
frequency creeps lower and lower. In this dmesg log [1] from a recent
incident, we see that there were a couple core and package throttling
events. This in itself isn't problematic; what is troubling is that
despite the fact that the temperature quickly returned to normal, the
CPU frequency remained at just below 400 MHz for the next hour or so
while I gathered data on the issue with the system under load. The
temperature was a a stable low-60 degrees Celcius for this duration.
After I finished gathering data I killed the CPU-intensive process and
it took over ten minutes for frequency scaling to behave normally again,
eventually scaling up to 3.3 GHz when necessary.. I experience these
sorts of events fairly regularly when placing the machine under load.

It seems to make no difference whether I use the powersave or
performance governor. This is strange as most accounts I have seen claim
that the performance governor unconditionally sets the CPU frequency at
its maximum frequency. Even if there were a thermal limit the system
temperature in this case isn't terribly unreasonable (60 to 65 degrees
Celcius).

I've attached some further information gathered during the
incident, which occurred with a 4.2-rc5 kernel, although I have been
experiencing issues of this nature ever since I bought the machine
(mostly in the summer).

How would one further trace down this issue? The kernel tree seems to
be rather lacking in documentation describing what factors enter
intel_pstate's scaling decisions. Is there any way to get better
visibility into this process?

Any ideas on what might be going wrong here?

Cheers,

- Ben


[1] https://gist.github.com/bgamari/ae032532a13fa52a8a69


$ cpupower monitor
    |Nehalem                    || SandyBridge        || HaswellExtended    || Mperf              || Idle_Stats
CPU | C3   | C6   | PC3  | PC6  || C7   | PC2  | PC7  || PC8  | PC9  | PC10 || C0   | Cx   | Freq || POLL | C1-H | C1E- | C3-H | C6-H | C7s- | C8-H | C9-H | C10-
   0|  7.04|  5.22|  0.00|  0.00|| 31.01| 18.16|  0.00||  0.00|  0.00|  0.00|| 40.21| 59.79|   388||  0.00|  0.04|  0.57|  5.08|  3.23| 13.00|  8.80| 29.25|  0.00
   2|  7.04|  5.22|  0.00|  0.00|| 31.01| 18.16|  0.00||  0.00|  0.00|  0.00|| 27.59| 72.41|   379||  0.00|  0.01|  0.20|  7.76|  5.16| 17.02| 19.51| 21.57|  1.15
   1|  3.59|  2.92|  0.00|  0.00|| 41.40| 18.16|  0.00||  0.00|  0.00|  0.00|| 32.14| 67.86|   394||  0.00|  0.01|  0.26|  5.21|  4.30| 24.69|  6.45| 24.22|  2.83
   3|  3.59|  2.92|  0.00|  0.00|| 41.40| 18.16|  0.00||  0.00|  0.00|  0.00|| 26.58| 73.42|   367||  0.00|  0.00|  0.11|  1.87|  1.14| 30.36|  5.54| 32.62|  1.95

$ cpupower frequency-info
analyzing CPU 0:
  driver: intel_pstate
  CPUs which run at the same hardware frequency: 0
  CPUs which need to have their frequency coordinated by software: 0
  maximum transition latency: 0.97 ms.
  hardware limits: 800 MHz - 3.30 GHz
  available cpufreq governors: performance, powersave
  current policy: frequency should be within 800 MHz and 3.30 GHz.
                  The governor "performance" may decide which speed to use
                  within this range.
  current CPU frequency is 380 MHz (asserted by call to hardware).
  boost state support:
    Supported: yes
    Active: yes

$ sensors
acpitz-virtual-0
Adapter: Virtual device
temp1:        +25.0C  (crit = +107.0C)

coretemp-isa-0000
Adapter: ISA adapter
Physical id 0:  +63.0C  (high = +100.0C, crit = +100.0C)
Core 0:         +62.0C  (high = +100.0C, crit = +100.0C)
Core 1:         +63.0C  (high = +100.0C, crit = +100.0C)

dell_smm-virtual-0
Adapter: Virtual device
Processor Fan: 6710 RPM
CPU:            +62.0C
Ambient:        +49.0C
SODIMM:         +52.0C

$ cd /sys/devices/system/cpu/intel_pstate
$ cat {max,min}_perf_pct
100
100
$ cat no_turbo num_pstates turbo_pct 
0
26
24

$ cd /sys/kernel/debug/pstate_snb
$ cat pgain_pct
20
$ cat igain_pct
0
$ cat dgain_pct
0
$ cd ../pkg_temp_thermal
$ cat pkg_thres_*
0
0
$ cd ../intel_powerclamp
$ cat powerclamp_calib 
controlling cpu: 0
pct confidence steady dynamic (compensation)
0	0	0	0
1	0	0	0
2	0	0	0
... (remaining lines also all zeros)



$ sudo turbostat 
     CPU Avg_MHz   %Busy Bzy_MHz TSC_MHz
       -     175   47.37     369    2694
       0     210   55.23     380    2698
       2     219   61.70     354    2693
       1     139   36.09     385    2692
       3     131   36.45     360    2694
     CPU Avg_MHz   %Busy Bzy_MHz TSC_MHz
       -     167   45.63     365    2696
       0     108   28.06     385    2695
       2     314   89.24     352    2698
       1     130   33.43     388    2698
       3     115   31.75     364    2694
     CPU Avg_MHz   %Busy Bzy_MHz TSC_MHz
       -     174   46.53     373    2694
       0     176   45.48     386    2696
       2     200   55.75     360    2694
       1     179   46.42     385    2694
       3     139   38.47     362    2694

$ cpupower idle-info
CPUidle driver: intel_idle
CPUidle governor: menu

Analyzing CPU 0:
Number of idle states: 9
Available idle states: POLL C1-HSW C1E-HSW C3-HSW C6-HSW C7s-HSW C8-HSW C9-HSW C10-HSW
POLL:
Flags/Description: CPUIDLE CORE POLL IDLE
Latency: 0
Usage: 19629
Duration: 4903415
C1-HSW:
Flags/Description: MWAIT 0x00
Latency: 2
Usage: 12066075
Duration: 2316078427
C1E-HSW:
Flags/Description: MWAIT 0x01
Latency: 10
Usage: 1437624
Duration: 497058866
C3-HSW:
Flags/Description: MWAIT 0x10
Latency: 33
Usage: 1664168
Duration: 916288273
C6-HSW:
Flags/Description: MWAIT 0x20
Latency: 133
Usage: 456853
Duration: 353643717
C7s-HSW:
Flags/Description: MWAIT 0x32
Latency: 166
Usage: 1714991
Duration: 1671456695
C8-HSW:
Flags/Description: MWAIT 0x40
Latency: 300
Usage: 1435877
Duration: 1966505031
C9-HSW:
Flags/Description: MWAIT 0x50
Latency: 600
Usage: 1565954
Duration: 3739218646
C10-HSW:
Flags/Description: MWAIT 0x60
Latency: 2600
Usage: 118301
Duration: 955949684


Download attachment "signature.asc" of type "application/pgp-signature" (473 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ