lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 9 Jun 2020 22:23:39 +0200
From:   Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To:     Stephen Berman <stephen.berman@....net>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        linux-kernel@...r.kernel.org
Subject: Re: power-off delay/hang due to commit 6d25be57 (mainline)

On 2020-06-09 12:06:23 [+0200], Stephen Berman wrote:
> I recompiled kernel 5.6.4 with the printk() call you suggested, then
> booted the kernel with "ignore_loglevel initcall_debug" (but leaving the
> CDROM and wifi intact for now).  After working as I normally do, I
> called `shutdown -h now', again as usual.  After the "Bringing down the
> loopback interface" message there were these two messages:
> 
> reboot: __do_sys_reboot(317)CMD: 89abcdef
> reboot: __do_sys_reboot(317)CMD: 4321fedc
> 
> Then nothing more for two minutes, then, as previously:
> 
> sr 5:0:0:0: tag#10 timing out command, waited 120 seconds.
> 
> Then I did a hard reboot.
> 
> This morning I detached the cables to the CDROM (but left the disk in
> the box) and again booted 5.6.4 with "ignore_loglevel initcall_debug".
> After working for a bit, I called `shutdown -h now', and now there were
> quite a few more messages, but again the machine did not power off.
> Here is the ouput, which I transcribed carefully, hopefully without
> error (I omitted the fractional parts of the timestamps and some of the
> usb1-portX messages):

If it helps you could delay printks on shutdown (via
/proc/sys/kernel/printk_delay) and record a video clip. I could make
storage available for an upload.


| *   Bringing down the loopback interface...
| [1123.***] reboot: __do_sys_reboot(317)CMD: 89abcdef
| [1123.***] reboot: __do_sys_reboot(317)CMD: 4321fedc
This happens quicker than I expected. Initially I assumed that you are
not there yet and userland is still poking at your cdrom. Accordin
…
| [1124.***] sd 4:0:0:0: shutdown
| [1124.***] sd 4:0:0:0: shutdown [sda] Synchronizing SCSI cache
| [1124.***] sd 4:0:0:0: shutdown [sda] Stopping disk

After this, there shouldn't be any outstanding disk requests.
…
| [1125.***] ahci 0000:00:17.0: shutdown

This disables the ahci controller which means disk/cdrom requests won't
be answered by the hardware.
…
| [1125.***] ACPI: Preparing to enter system sleep state S5
| [1352.***] INFO: task halt:5187 blocked for more than 122 seconds.
| [1352.***]                  Not tainted 5.6.4 #4
| [1352.***] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message
| [1352.***] halt            D13856  5187   5186 0x00004000
| [1352.***] Call Trace:
| [1352.***]  ? __schedule+0x272/0x5b0
| [1352.***]  schedule+0x45/0xb0
| [1352.***]  schedule_timeout+0x204/0x2f0
| [1352.***]  ? acpi_os_release_object+0x5/0x10
| [1352.***]  ? acpi_ut_update_object_reference+0x14e/0x1d2
| [1352.***]  wait_for_completion+0xa3/0x100
| [1352.***]  ? wake_up_q+0x90/0x90
| [1352.***]  flush_workqueue+0x130/0x420

I have no idea where this flush_workqueue() is coming from. The command
  scripts/decode_stacktrace.sh vmlinux $(pwd)/ kernel_power_off+0x3d/0x70

should reveal that.
The cdrom is polled by the kernel every two seconds. I *think* the
kernel is blocked for some reason and then the cdrom polled. This can't
complete and you see the srX warning. The window is quite small but not
impossible. 
I managed to fabricate this case but after 
|[  137.581613] sr 3:0:0:0: tag#1 timing out command, waited 120s

The system shutdowns. So you are facing something different. I am
surprised that the workqueue stall detector did not yell here. The patch
at the bottom should get rid of the sr warnings.


| [1352.***]  kernel_power_off+0x3d/0x70
| [1352.***]  __do_sys_reboot+0x140/0x220
…

> After the last message here, the "echo", Call Trace, RIP and register
> messages were repeated exactly as above.  At this point I did a hard
> reboot.  A web search found something similar to the above output
> ("System hangs (bad RIP value) when disk used in pool is removed"), but
> didn't otherwise seem to be related to the problem I'm having.
> 
> Do you still want me to check whether removing the iwlwifi driver makes
> a differece?  And with the CDROM still detached, or does that not
> matter?

I assumed the wifi driver shuts the AHCI port for some reason. But
according to this log it does not happen, the ahci port is shutdown
properly. The patch at the bottom should get rid of the sr warning. Then
could you please try the other patch so we see which workqueue is
blocked? I am curious to see why the system is blocked.

> Steve Berman

Sebastian

From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Date: Tue, 9 Jun 2020 22:13:46 +0200
Subject: [PATCH] SCSI: Disable CD-ROM poll on shutdown

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
---
 drivers/scsi/sr.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c
index d2fe3fa470f95..2484e594af283 100644
--- a/drivers/scsi/sr.c
+++ b/drivers/scsi/sr.c
@@ -85,6 +85,13 @@ static blk_status_t sr_init_command(struct scsi_cmnd *SCpnt);
 static int sr_done(struct scsi_cmnd *);
 static int sr_runtime_suspend(struct device *dev);
 
+static void sr_shutdown(struct device *dev)
+{
+	struct scsi_cd *cd = dev_get_drvdata(dev);
+
+	disk_block_events(cd->disk);
+}
+
 static const struct dev_pm_ops sr_pm_ops = {
 	.runtime_suspend	= sr_runtime_suspend,
 };
@@ -95,6 +102,7 @@ static struct scsi_driver sr_template = {
 		.owner		= THIS_MODULE,
 		.probe		= sr_probe,
 		.remove		= sr_remove,
+		.shutdown	= sr_shutdown,
 		.pm		= &sr_pm_ops,
 	},
 	.init_command		= sr_init_command,
-- 
2.27.0

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ