lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 11 Aug 2020 17:25:51 +0200
From:   Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To:     "Rafael J. Wysocki" <rafael@...nel.org>
Cc:     Stephen Berman <stephen.berman@....net>,
        Zhang Rui <rui.zhang@...el.com>,
        Robert Moore <robert.moore@...el.com>,
        Erik Kaneda <erik.kaneda@...el.com>,
        "Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
        Len Brown <lenb@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        ACPI Devel Maling List <linux-acpi@...r.kernel.org>,
        "open list:ACPI COMPONENT ARCHITECTURE (ACPICA)" <devel@...ica.org>
Subject: Re: power-off delay/hang due to commit 6d25be57 (mainline)

On 2020-08-11 16:34:09 [+0200], Rafael J. Wysocki wrote:
> On Tue, Aug 11, 2020 at 3:29 PM Sebastian Andrzej Siewior
> <bigeasy@...utronix.de> wrote:
> >
> > On 2020-08-11 13:58:39 [+0200], Stephen Berman wrote:
> > > him about your workaround of adding 'thermal.tzp=300' to the kernel
> > > commandline, and he replied that this works for him too.  And it turns
> > > out we have similar motherboards: I have a Gigabyte Z390 M Gaming
> > > Rev. 1001 board and he has Gigabyte Z390 Designare rev 1.0.
> >
> > Yes. Based on latest dmesg, the ACPI tables contain code which schedules
> > the worker and takes so long. It is possible / likely that his board
> > contains the same tables which leads to the same effect. After all those
> > two boards are very similar from the naming part :)
> > Would you mind to dump the ACPI tables and send them? There might be
> > some hints.
> 
> Do we have a BZ for this?  It would be useful to open one if not.

no, it came via lkml and I looked at it since it was bisected to a
workqueue commit with my signoff…
Stephen, can you open a bug on https://bugzilla.kernel.org/?

> > It might be possible that a BIOS update fixes the problem but I would
> > prefer very much to fix this in kernel to ensure that such a BIOS does
> > not lead to this problem again.
> 
> I agree.
> 
> It looks like one way to address this issue might be to add a rate
> limit for thermal notifications on a given zone.

So one thing is that ACPI says to poll every second and driver is doing
it. This could be increased to something like 15 or 30 seconds as lower
sane level. I don't think there is much value in polling this sensor
every second. As workaound, Stephen is using `thermal.tzp=300' now. 

Would it make sense to flush the workqueue before checking the
temperature? I have no idea what the ACPI is doing there but there is no
upper limit on time how long in may take, right? Doing this inline (and
avoiding the worker) is probably causing other trouble, right?

Sebastian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ