lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 09 Mar 2014 10:28:43 -0700
From:	Guenter Roeck <linux@...ck-us.net>
To:	Manuel Krause <manuelkrause@...scape.net>,
	linux-kernel@...r.kernel.org, linux-pm@...r.kernel.org
CC:	Jean Delvare <jdelvare@...e.de>, lm-sensors@...sensors.org,
	"Rafael J. Wysocki" <rjw@...ysocki.net>
Subject: Re: 3.13.?: Strange / dangerous fan policy...

On 03/08/2014 04:10 PM, Manuel Krause wrote:
> On 2014-03-08 16:59, Guenter Roeck wrote:
>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause wrote:
>>>>> Hi, and thanks for the quick response!
>>>>> No special fancy "fan control policy". 'fancontrol' isn't up or
>>>>> running.
>>>>> Vanilla kernels 3.11.* and 3.12.* had been working on here
>>>>> without
>>>>> any extra work.
>>>>> --
>>>>> # sensors
>>>>> acpitz-virtual-0
>>>>> Adapter: Virtual device
>>>>> temp1:        +71.0°C  (crit = +256.0°C)
>>>>> temp2:        +69.0°C  (crit = +110.0°C)
>>>>> temp3:        +52.0°C  (crit = +105.0°C)
>>>>> temp4:        +25.0°C  (crit = +110.0°C)
>>>>> temp5:        +58.0°C  (crit = +110.0°C)
>>>>>
>>>>> coretemp-isa-0000
>>>>> Adapter: ISA adapter
>>>>> Core 0:       +62.0°C  (high = +105.0°C, crit = +105.0°C)
>>>>> Core 1:       +60.0°C  (high = +105.0°C, crit = +105.0°C)
>>>>> --
>>>>> My notebook (HP/Compaq 6730b) does not have a seperate fan
>>>>> sensor.
>>>>> This is with 3.12.13 with my normal workload.
>>>>>
>>>>> Please, trust my above mentionned values of 94 °C vs. 74°C as I
>>>>> don't like to boot 3.13.6 anymore, to avoid harm to the
>>>>> notebook's
>>>>> casing.
>>>>
>>>> Understood. Unfortunately, we'll need to get information
>>>> from the new kernel to be able to track down the problem.
>>>
>>> Indeed. Not only the run-time temperatures, but also the high
>>> and crit
>>> limits.
>>>
>>>>> But I'd do to test any improvement-patch.
>>>>
>>>> So far I have no idea what is going on. I don't see anything
>>>> in the
>>>> drivers providing above data that would explain the behavior,
>>>> but I might be missing something.
>>>
>>> Looks like a regression in the acpi subsystem or in power
>>> management,
>>> not hwmon. Hwmon is merely reporting the temperatures, it's not
>>> responsible for the actual temperatures.
>>>
>>
>> I would agree. I don't think we have enough information to be sure,
>> though. There might be some unintended interaction or interference.
>>
>> gpu is a good hint ... for example, look at commit b9ed919f1c8
>> (drm/nouveau/drm/pm: remove everything except the hwmon interfaces
>> to THERM). nouveau does export pwm and fan control information,
>> so any change in that code may have unintended side effects.
>> Similar, I don't know how ec39f64bba (drm/radeon/dpm: Convert to
>> use devm_hwmon_register_with_groups) could have the observed impact,
>> as it is purely passive, but I prefer to be rather safe than sorry.
>>
>> This problem has now been submitted into bugzilla as
>> https://bugzilla.kernel.org/show_bug.cgi?id=71711.
>>
>> Guenter
>>
>
> Sorry, for beeing late, had to search for/accumulate much info for you...
> I hope, you like me to put it into one answer to you all CCing you.
>
> My GFX is a GM45 Intel (mobile), shared memory, running the opensource Mesa drivers/extensions.
> kernel-module: i915
>
> According to the output of 'cpupower': I have
> CPUidle driver: acpi_idle
> CPUidle governor: menu
>
> CPUfreq:
>    driver: acpi-cpufreq
>    available cpufreq governors: ondemand, performance
> -
> And "ondemand" is running.
> --
>
> # sensors
> acpitz-virtual-0
> Adapter: Virtual device
> temp1:        +41.0°C  (crit = +256.0°C)
> temp2:        +92.0°C  (crit = +110.0°C)
> temp3:        +71.0°C  (crit = +105.0°C)
> temp4:        +26.5°C  (crit = +110.0°C)
> temp5:        +25.0°C  (crit = +110.0°C)
>
> coretemp-isa-0000
> Adapter: ISA adapter
> Core 0:       +86.0°C  (high = +105.0°C, crit = +105.0°C)
> Core 1:       +84.0°C  (high = +105.0°C, crit = +105.0°C)
>
> FROM a critical "smelly" situation today, kernel-compilation, fan @100%.
> --
>
> Additional findings:
>
> Identification from bootup ACPI initialisation vs. sensors:
> temp1 = DTSZ
> temp2 = CPUZ --> triggering Cooling in 3.12.13 if > 74°C
> temp3 = SKNZ
> temp4 = BATZ "Battery Zone" always calm ~ +6°C of ambient T
> temp5 = FDTZ --- in 3.12.13 a representation of the cooling-fan (25 - 45 - 58 - max?)
> Core 0 & Core 1 are the internal CPU T sensors.
>
> With the 3.13.x (.5+) kernels the first gatherered cooling settings from bootup do stay forever. Means, rebooting a hot system will get a FDTZ @45°C+ and won't make any problems, as it does cool enough (even for kernel compiling on here). If it gets 25°C @bootup the system goes into emergency cooling somewhen. Same is with a suspend/resume.
>
> Kernel 3.12.13 adjusts the cooling on it's own, but appropriately.
>

Hi Manuel,

thanks a lot for the additional information.

I added this exchange to bugzilla (https://bugzilla.kernel.org/show_bug.cgi?id=71711).
This is pretty much all I can do at this point; I have no idea what
is going on. Some change in ACPI would be my guess, but I did not see
anything catching my eye when looking through the ACPI code.

Guenter

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ