lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87imfyh6yx.fsf@gmx.net>
Date:   Thu, 11 Jun 2020 00:49:26 +0200
From:   Stephen Berman <stephen.berman@....net>
To:     Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        linux-kernel@...r.kernel.org
Subject: Re: power-off delay/hang due to commit 6d25be57 (mainline)

On Wed, 10 Jun 2020 12:25:14 +0200 Sebastian Andrzej Siewior <bigeasy@...utronix.de> wrote:

> On 2020-06-10 10:21:29 [+0200], Stephen Berman wrote:
>> (GMX put your email into my spam folder, so I didn't see it before I
>> sent my followup about removing the wifi firmware.)
>
> okay.
>
>> On Tue, 9 Jun 2020 22:23:39 +0200 Sebastian Andrzej Siewior
>> <bigeasy@...utronix.de> wrote:
>> >   scripts/decode_stacktrace.sh vmlinux $(pwd)/ kernel_power_off+0x3d/0x70
>> >
>> > should reveal that.
>>
>> I saved the call trace lines to a file and did this (with `$(pwd)/' the
>> script did not run):
>>
>> steve [ /sources/linux-5.6.4 ]$ ./scripts/decode_stacktrace.sh vmlinux
>> . kernel_power_off+0x3d/0x70 < /home/steve/temp/call_trace
>> [1352.***] Call Trace:
>> [1352.***] ? __schedule (core.c:?)
>> [1352.***] schedule (??:?)
>> [1352.***] schedule_timeout (??:?)
>> [1352.***] ? acpi_os_release_object (??:?)
>> [1352.***] ? acpi_ut_update_object_reference (??:?)
>> [1352.***] wait_for_completion (??:?)
>> [1352.***] ? wake_up_q (??:?)
>> [1352.***] flush_workqueue (??:?)
>> [1352.***] kernel_power_off (??:?)
>> [1352.***] __do_sys_reboot (reboot.c:?)
>> [1352.***] ? do_send_sig_info (??:?)
>> [1352.***] ? kill_pid_info (??:?)
>> [1352.***] ? kill_something_info (signal.c:?)
>> [1352.***] ? __x64_sys_kill (??:?)
>> [1352.***] do_syscall_64 (??:?)
>> [1352.***] entry_SYSCALL_64_after_hwframe (??:?)
>> [1352.***] RIP: 0033:0x7f95dd6992c3
>> [1352.***] Code: Bad RIP value.
>> objdump: '/tmp/tmp.BlpNDdSQpG.o': No such file
>>
>> Did I do something wrong or do I need to pass certain compiler flags
>> when building the kernel (when I build Emacs I pass CFLAGS='-Og -g3')?
>
> the arguments last argument should not be kernel_power_off+0x3d/0x70. I
> should have been entered after "enter" but you can also pipe the whole
> file as you did.
> If it gives you the ?? then might need
>   CONFIG_DEBUG_INFO

Thanks.  I set that, recompiled and now get this:


Download attachment "decode_stacktrace" of type "application/text/plain" (2637 bytes)


[...]
>> By the other patch do you mean the following?  (This email was also put
>> into my spam by GMX and AFAICT has not yet shown up on the vger list.)
>
> Yes. But now that I played a little with it, I made dis:
[...]
> So you should have output like:
> |[   14.131696] Flushing events_freezable_power_
> the flush that hangs
>
> |[   17.203246] Flushing events_freezable_power_ over, 0
> It has been noticed (0 at the end)
>
> |[   17.203849] Showing busy workqueues and worker pools:
> |[   17.204400] workqueue events_freezable_power_: flags=0x84
> |[   17.205009]   pwq 12: cpus=6 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> |[   17.205774]     in-flight: 2147:disk_events_workfn
> |[   17.206268] workqueue kblockd: flags=0x18
> |[   17.206682]   pwq 13: cpus=6 node=0 flags=0x0 nice=-20 active=1/256 refcnt=2
> |[   17.207428]     pending: blk_mq_requeue_work
> |[ 17.207958] pool 12: cpus=6 node=0 flags=0x0 nice=0 hung=2s workers=3 idle:
> | 535 42
>
> A dump of all pending worqueue items.
>
> |[   20.275239] Flushing events_freezable_power_ over, 0
> |[   20.275812] Showing busy workqueues and worker pools:
> |[   20.276385] workqueue events_freezable_power_: flags=0x84
> |[   20.276972]   pwq 12: cpus=6 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> |[   20.277712]     in-flight: 2147:disk_events_workfn
> |[   20.278245] workqueue kblockd: flags=0x18
> |[   20.278685]   pwq 13: cpus=6 node=0 flags=0x0 nice=-20 active=1/256 refcnt=2
> |[   20.279471]     pending: blk_mq_requeue_work
> |[ 20.279989] pool 12: cpus=6 node=0 flags=0x0 nice=0 hung=5s workers=3 idle:
> | 535 42
>
> Another atempt. As you see `blk_mq_requeue_work' is references in both
> outputs but it makes progress (it is scheduled multiple times). The item
> `disk_events_workfn' stucks.

I rebuilt kernel 5.6.4 with that patch, but before calling `shutdown -h
now' I guess I should have set printk_delay, because this time there was
a lot of output and it went by too fast.  I'll try to delay it the next
time.  After the initial output (which went by too fast for me to
photograph) there were endlessly repeated screenfuls of output like this:

[ 2578.402398] , acpi_os_execute_deferred, acpi_os_execute_deferred, acpi_os_execute_deferred, acpi_os_execute_deferred, acpi_os_execute_deferred, acpi_os_execute_deferred, acpi_os_execute_deferred, acpi_os_execute_deferred, acpi_os_execute_deferred, acpi_os_execute_deferred, acpi_os_execute_deferred

and at the bottom of the screen this:

[ 2578.427680] pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=0s workers=4 idle: 1331

After this line there was a brief pause (so I was able to photograph it)
and then it repeated, always the same.  In the last line pool, cpus,
node, flags, nice and hung were always 0, and workers was always 4.
Only idle changed, but seemed to cycle through a few values.  After
watching this repetition for several minutes, I finally did a hard
reboot.

Steve Berman

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ