linux-kernel - Re: power-off delay/hang due to commit 6d25be57 (mainline)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87tuzjcovq.fsf@gmx.net>
Date:   Wed, 10 Jun 2020 10:21:29 +0200
From:   Stephen Berman <stephen.berman@....net>
To:     Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        linux-kernel@...r.kernel.org
Subject: Re: power-off delay/hang due to commit 6d25be57 (mainline)

(GMX put your email into my spam folder, so I didn't see it before I
sent my followup about removing the wifi firmware.)

On Tue, 9 Jun 2020 22:23:39 +0200 Sebastian Andrzej Siewior <bigeasy@...utronix.de> wrote:

> On 2020-06-09 12:06:23 [+0200], Stephen Berman wrote:
>> I recompiled kernel 5.6.4 with the printk() call you suggested, then
>> booted the kernel with "ignore_loglevel initcall_debug" (but leaving the
>> CDROM and wifi intact for now).  After working as I normally do, I
>> called `shutdown -h now', again as usual.  After the "Bringing down the
>> loopback interface" message there were these two messages:
>> 
>> reboot: __do_sys_reboot(317)CMD: 89abcdef
>> reboot: __do_sys_reboot(317)CMD: 4321fedc
>> 
>> Then nothing more for two minutes, then, as previously:
>> 
>> sr 5:0:0:0: tag#10 timing out command, waited 120 seconds.
>> 
>> Then I did a hard reboot.
>> 
>> This morning I detached the cables to the CDROM (but left the disk in
>> the box) and again booted 5.6.4 with "ignore_loglevel initcall_debug".
>> After working for a bit, I called `shutdown -h now', and now there were
>> quite a few more messages, but again the machine did not power off.
>> Here is the ouput, which I transcribed carefully, hopefully without
>> error (I omitted the fractional parts of the timestamps and some of the
>> usb1-portX messages):
>
> If it helps you could delay printks on shutdown (via
> /proc/sys/kernel/printk_delay) and record a video clip. I could make
> storage available for an upload.

So far this hasn't been necessary, since the new output until the first
two-minute pause fit on one screen and I could photograph it.

> | *   Bringing down the loopback interface...
> | [1123.***] reboot: __do_sys_reboot(317)CMD: 89abcdef
> | [1123.***] reboot: __do_sys_reboot(317)CMD: 4321fedc
> This happens quicker than I expected. Initially I assumed that you are
> not there yet and userland is still poking at your cdrom. Accordin
> …
> | [1124.***] sd 4:0:0:0: shutdown
> | [1124.***] sd 4:0:0:0: shutdown [sda] Synchronizing SCSI cache
> | [1124.***] sd 4:0:0:0: shutdown [sda] Stopping disk
>
> After this, there shouldn't be any outstanding disk requests.
> …
> | [1125.***] ahci 0000:00:17.0: shutdown
>
> This disables the ahci controller which means disk/cdrom requests won't
> be answered by the hardware.
> …
> | [1125.***] ACPI: Preparing to enter system sleep state S5
> | [1352.***] INFO: task halt:5187 blocked for more than 122 seconds.
> | [1352.***]                  Not tainted 5.6.4 #4
> | [1352.***] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message
> | [1352.***] halt            D13856  5187   5186 0x00004000
> | [1352.***] Call Trace:
> | [1352.***]  ? __schedule+0x272/0x5b0
> | [1352.***]  schedule+0x45/0xb0
> | [1352.***]  schedule_timeout+0x204/0x2f0
> | [1352.***]  ? acpi_os_release_object+0x5/0x10
> | [1352.***]  ? acpi_ut_update_object_reference+0x14e/0x1d2
> | [1352.***]  wait_for_completion+0xa3/0x100
> | [1352.***]  ? wake_up_q+0x90/0x90
> | [1352.***]  flush_workqueue+0x130/0x420
>
> I have no idea where this flush_workqueue() is coming from. The command
>   scripts/decode_stacktrace.sh vmlinux $(pwd)/ kernel_power_off+0x3d/0x70
>
> should reveal that.

I saved the call trace lines to a file and did this (with `$(pwd)/' the
script did not run):

steve [ /sources/linux-5.6.4 ]$ ./scripts/decode_stacktrace.sh vmlinux . kernel_power_off+0x3d/0x70 < /home/steve/temp/call_trace 
[1352.***] Call Trace:
[1352.***] ? __schedule (core.c:?) 
[1352.***] schedule (??:?) 
[1352.***] schedule_timeout (??:?) 
[1352.***] ? acpi_os_release_object (??:?) 
[1352.***] ? acpi_ut_update_object_reference (??:?) 
[1352.***] wait_for_completion (??:?) 
[1352.***] ? wake_up_q (??:?) 
[1352.***] flush_workqueue (??:?) 
[1352.***] kernel_power_off (??:?) 
[1352.***] __do_sys_reboot (reboot.c:?) 
[1352.***] ? do_send_sig_info (??:?) 
[1352.***] ? kill_pid_info (??:?) 
[1352.***] ? kill_something_info (signal.c:?) 
[1352.***] ? __x64_sys_kill (??:?) 
[1352.***] do_syscall_64 (??:?) 
[1352.***] entry_SYSCALL_64_after_hwframe (??:?) 
[1352.***] RIP: 0033:0x7f95dd6992c3
[1352.***] Code: Bad RIP value.
objdump: '/tmp/tmp.BlpNDdSQpG.o': No such file

Did I do something wrong or do I need to pass certain compiler flags
when building the kernel (when I build Emacs I pass CFLAGS='-Og -g3')?

> The cdrom is polled by the kernel every two seconds. I *think* the
> kernel is blocked for some reason and then the cdrom polled. This can't
> complete and you see the srX warning. The window is quite small but not
> impossible. 
> I managed to fabricate this case but after 
> |[  137.581613] sr 3:0:0:0: tag#1 timing out command, waited 120s
>
> The system shutdowns. So you are facing something different. I am
> surprised that the workqueue stall detector did not yell here. The patch
> at the bottom should get rid of the sr warnings.

But the cdrom seems to have nothing to do with power-off failing, since
it happens even when the cdrom is detached (and hence there were no sr
warnings in the log output), or am I misunderstanding you?

> | [1352.***]  kernel_power_off+0x3d/0x70
> | [1352.***]  __do_sys_reboot+0x140/0x220
> …
>
>> After the last message here, the "echo", Call Trace, RIP and register
>> messages were repeated exactly as above.  At this point I did a hard
>> reboot.  A web search found something similar to the above output
>> ("System hangs (bad RIP value) when disk used in pool is removed"), but
>> didn't otherwise seem to be related to the problem I'm having.
>> 
>> Do you still want me to check whether removing the iwlwifi driver makes
>> a differece?  And with the CDROM still detached, or does that not
>> matter?
>
> I assumed the wifi driver shuts the AHCI port for some reason. But
> according to this log it does not happen, the ahci port is shutdown
> properly. The patch at the bottom should get rid of the sr warning. Then
> could you please try the other patch so we see which workqueue is
> blocked? I am curious to see why the system is blocked.

By the other patch do you mean the following?  (This email was also put
into my spam by GMX and AFAICT has not yet shown up on the vger list.)

From: Hillf Danton <hdanton@...a.com>
Subject: Re: power-off delay/hang due to commit 6d25be57 (mainline)
To: Stephen Berman <stephen.berman@....net>
Cc: Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <peterz@...radead.org>,
	Markus Elfring <Markus.Elfring@....de>,
	linux-kernel@...r.kernel.org
Date: Tue,  9 Jun 2020 23:06:07 +0800
[...]
Curious what the workqueue it is blocking halt.

--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2826,7 +2826,14 @@ void flush_workqueue(struct workqueue_st
 
 	mutex_unlock(&wq->mutex);
 
-	wait_for_completion(&this_flusher.done);
+	if (system_state == SYSTEM_POWER_OFF) {
+		/*
+		 * save data before running at warned risk in any form
+		 */
+		wait_for_completion_timeout(&this_flusher.done, 2*HZ);
+		pr_info("workqueue %s 2s since SYSTEM_POWER_OFF\n", wq->name);
+	} else
+		wait_for_completion(&this_flusher.done);
 
 	/*
 	 * Wake-up-and-cascade phase

Steve Berman