linux-kernel - Re: power-off delay/hang due to commit 6d25be57 (mainline)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87wo4dligz.fsf@gmx.net>
Date:   Thu, 11 Jun 2020 17:39:40 +0200
From:   Stephen Berman <stephen.berman@....net>
To:     Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        linux-kernel@...r.kernel.org
Subject: Re: power-off delay/hang due to commit 6d25be57 (mainline)

On Thu, 11 Jun 2020 00:49:26 +0200 Stephen Berman <stephen.berman@....net> wrote:

> On Wed, 10 Jun 2020 12:25:14 +0200 Sebastian Andrzej Siewior
> <bigeasy@...utronix.de> wrote:
> [...]
>>> By the other patch do you mean the following?  (This email was also put
>>> into my spam by GMX and AFAICT has not yet shown up on the vger list.)
>>
>> Yes. But now that I played a little with it, I made dis:
> [...]
>> So you should have output like:
>> |[   14.131696] Flushing events_freezable_power_
>> the flush that hangs
>>
>> |[   17.203246] Flushing events_freezable_power_ over, 0
>> It has been noticed (0 at the end)
>>
>> |[   17.203849] Showing busy workqueues and worker pools:
>> |[   17.204400] workqueue events_freezable_power_: flags=0x84
>> |[   17.205009]   pwq 12: cpus=6 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
>> |[   17.205774]     in-flight: 2147:disk_events_workfn
>> |[   17.206268] workqueue kblockd: flags=0x18
>> |[   17.206682]   pwq 13: cpus=6 node=0 flags=0x0 nice=-20 active=1/256 refcnt=2
>> |[   17.207428]     pending: blk_mq_requeue_work
>> |[ 17.207958] pool 12: cpus=6 node=0 flags=0x0 nice=0 hung=2s workers=3 idle:
>> | 535 42
>>
>> A dump of all pending worqueue items.
>>
>> |[   20.275239] Flushing events_freezable_power_ over, 0
>> |[   20.275812] Showing busy workqueues and worker pools:
>> |[   20.276385] workqueue events_freezable_power_: flags=0x84
>> |[   20.276972]   pwq 12: cpus=6 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
>> |[   20.277712]     in-flight: 2147:disk_events_workfn
>> |[   20.278245] workqueue kblockd: flags=0x18
>> |[   20.278685]   pwq 13: cpus=6 node=0 flags=0x0 nice=-20 active=1/256 refcnt=2
>> |[   20.279471]     pending: blk_mq_requeue_work
>> |[ 20.279989] pool 12: cpus=6 node=0 flags=0x0 nice=0 hung=5s workers=3 idle:
>> | 535 42
>>
>> Another atempt. As you see `blk_mq_requeue_work' is references in both
>> outputs but it makes progress (it is scheduled multiple times). The item
>> `disk_events_workfn' stucks.
>
> I rebuilt kernel 5.6.4 with that patch, but before calling `shutdown -h
> now' I guess I should have set printk_delay, because this time there was
> a lot of output and it went by too fast.  I'll try to delay it the next
> time.

I've done that now.  I've sent you screenshots offlist.  Here's a brief
summary: The initial shutdown log output is essentially the same as the
transcription I posted upthread, except that it is interspersed with
messages from your patch like "Flush hid_reset+0x0/0x80".  Then comes:

ACPI: Preparing to enter system sleep state S5
Flushing kacpid
Flushing kacpid_notify
Flushing kacpid_notify over, 0
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
  pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
     pending: drm_fb_helper_diry_work
workqueue kacpid_notify: flags=0x0
  pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/1 refcnt=4185
     in-flight: 1578:acpi_os_execute_deferred
     delayed: acpi_os_execute_deferred, acpi_os_execute_deferred, ...

The "acpi_os_execute_deferred" messages were repeated many times in the
above line, then every 20-30 seconds again for several minutes.  Then
suddenly a call trace appeared which was similar but not identical to
the one I posted upthread, and each line of the trace was followed by
the line ", acpi_os_execute_deferred".  This went by quite quickly even
with the printk_delay I added, and I was unable to photograph the start
of it and couldn't get all of the subsequent output, but the screenshots
show some of it.  After several minutes of this output I again did a
hard reboot.

Steve Berman