lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <871rmesqkk.fsf@gmx.net>
Date:   Tue, 16 Jun 2020 22:28:43 +0200
From:   Stephen Berman <stephen.berman@....net>
To:     Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        linux-kernel@...r.kernel.org, linux-acpi@...r.kernel.org
Subject: Re: power-off delay/hang due to commit 6d25be57 (mainline)

On Tue, 16 Jun 2020 17:55:01 +0200 Sebastian Andrzej Siewior <bigeasy@...utronix.de> wrote:

> On 2020-06-16 10:13:27 [+0200], Stephen Berman wrote:
>> Yes, thanks, that did it.  Trace attached.
>
> So TZ10 is a temperature sensor of some kind on your motherboard. In
> your v5.6 dmesg there is:
> | thermal LNXTHERM:00: registered as thermal_zone0
> | ACPI: Thermal Zone [TZ10] (17 C)
>
> So. In /sys/class/thermal/thermal_zone0/device/path you should also see
> TZ10. And /sys/class/thermal/thermal_zone0/temp should show the actual
> value.
> This comes from the "thermal" module.

Yes, TZ10 was in the thermal_zone0/device/path and the value in
thermal_zone0/temp was 16800.

> Looking at the trace, might query the temperature every second which
> somehow results in "Dispatching Notify on". I don't understand how it
> gets from reading of the temperature to the notify part, maybe it is
> part of the ACPI…
>
> However. Could you please make sure that the thermal module is not
> loaded at system startup? Adding
>     thermal.off=1
>
> to the kernel commandline should do the trick. And you should see
>    thermal control disabled
>
> in dmesg. 

Confirmed.  And the value in thermal_zone0/temp was now 33000.

>           That means your thermal_zone0 with TZ10 does not show up in
> /sys and nothing should schedule the work-items. This in turn should
> allow you to shutdown your system without the delay.

It did!

> If this works, could you please try to load the module with tzp=300?
> If you add this
>  	thermal.tzp=300
>   
> to the kernel commandline then it should do the trick. You can verify it
> by
>    cat /sys/module/thermal/parameters/tzp 
>
> This should change the polling interval from what ACPI says to 30secs.
> This should ensure that you don't have so many worker waiting. So you
> should also be able to shutdown the system.

Your assessment and predictions are right on the mark!

I'm fine with the thermal.tzp=300 workaround, but it would be good to
find out why this problem started with commit 6d25be57, if my git
bisection was correct, or if it wasn't, then at least somewhere between
5.1.0 and 5.2.0.  Or can you already deduce why?  If not, I'd be more
than happy to continue applying any patches or trying any suggestions
you have, if you want to continue debugging this issue.  In any case,
thanks for pursuing it to this point.

Steve Berman

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ