linux-kernel - Re: Regression in 4.8 - CPU speed set very low

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJZ5v0g++0JaPUEaQy54_fcA5tv5TuNZw+5mbmo47OY-dD8HoQ@mail.gmail.com>
Date:   Tue, 27 Sep 2016 00:16:13 +0200
From:   "Rafael J. Wysocki" <rafael@...nel.org>
To:     Larry Finger <Larry.Finger@...inger.net>
Cc:     "Rafael J. Wysocki" <rafael@...nel.org>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux PM list <linux-pm@...r.kernel.org>,
        Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>
Subject: Re: Regression in 4.8 - CPU speed set very low

On Tue, Sep 27, 2016 at 12:09 AM, Larry Finger
<Larry.Finger@...inger.net> wrote:
> On 09/26/2016 04:37 PM, Rafael J. Wysocki wrote:
>>
>> On Mon, Sep 26, 2016 at 11:28 PM, Larry Finger
>> <Larry.Finger@...inger.net> wrote:
>>>
>>> On 09/26/2016 04:06 PM, Rafael J. Wysocki wrote:
>>>>
>>>>
>>>> On Monday, September 26, 2016 11:15:45 AM Larry Finger wrote:

[cut]

>>>
>>> Mostly I use a KDE applet named "System load" and look at the "average
>>> clock", but the same info is also available in /proc/cpuinfo as "cpu
>>> MHz".
>>> When the bug triggers, the system gets very slow, and the cpu fan stops
>>> even
>>> though the cpu is still busy.
>>
>>
>> That sounds like thermal throttling kicking in.
>
>
> I think it is because the cpu is idling. If a thermal throttling is
> responsible, why would it not fail for 168 hours, and then fail in 2?
>
>> What's there under /sys/class/thermal/ on your system?
>
>
> It contains the following directories:
>
> cooling_device0  cooling_device1  cooling_device2  cooling_device3
> cooling_device4  thermal_zone0  thermal_zone1
>>
>>
>>> Commit f7816ad, which had run for 7 days without showing the bug, failed
>>> after about 2 hours today. All my testing since Sept. 9 has been wasted.
>>> Oh
>>> well, that's the way it goes!
>>
>>
>> Are you confident that the issue was not reproducible before 4.8-rc2?
>> In particular, what about 4.8-rc1?
>
>
> 4.8-rc1 is definitely bad. I am now testing commit 5539204. In the bisect
> visualization, there are a number of cpufreq commits before the test case.

Maybe it's better to try diagnose the problem instead of spending more
time on bisection.

I'd like to know whether or not 4.7 was definitely good, though.

> If it is one of them, it may be a while before I dare call this one "good".
> In one respect, that is good as I will be traveling tomorrow and Wednesday.

What does "cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver" say?

Thanks,
Rafael