lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87mu6aurfn.fsf@gmx.net>
Date:   Thu, 14 May 2020 23:39:40 +0200
From:   Stephen Berman <stephen.berman@....net>
To:     Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        linux-kernel@...r.kernel.org
Subject: Re: power-off delay/hang due to commit 6d25be57 (mainline)

On Thu, 14 May 2020 00:04:28 +0200 Sebastian Andrzej Siewior <bigeasy@...utronix.de> wrote:

> On 2020-05-08 23:30:45 [+0200], Stephen Berman wrote:
>> > Can you log the output on the serial console?
>>
>> How do I do that?
>
> The spec for your mainboard says "serial port header". You would need to
> connect a cable there to another computer and log its output.
> The alternative would be to delay the output on the console and use a
> camera.

It's easiest for me to take a picture, since there isn't much output and
in any case the delay happens on it's own ;-).  I'm sending you the
image (from kernel 5.6.4) off-list since even after reducing it it's 1.2
MB large.

>> > If the commit you cited is really the problem then it would mean that a
>> > worker isn't scheduled for some reason. Could you please enable
>> > CONFIG_WQ_WATCHDOG to see if workqueue core code notices that a worker
>> > isn't making progress?

I enabled that and also CONFIG_SOFTLOCKUP_DETECTOR,
CONFIG_HARDLOCKUP_DETECTOR and CONFIG_DETECT_HUNG_TASK, which had all
been unset previously.

>> How will I know if that happens, is there a specific message in the tty?
>
> On the tty console where you see the "timing out command, waited"
> message, there should be something starting with
> |BUG: workqueue lockup - pool
>
> following with the pool information that got stuck. That code checks the
> workqueues every 30secs by default. So if you waited >= 60secs then
> system is not detecting a stall.

As you can see in the photo, there was no message about a workqueue
lockup, only "task halt:5320 blocked for more than <XXX> seconds" every
two minutes.  I suppose that comes from one of the other options I
enabled.  Does it reveal anything about the problem?

> As far as I can tell, there is nothing special on your system. The CD
> and disk drives are served by the AHCI controller. There is no special
> SCSI/SATA/SAS controller.
> Right now I have no idea how the workqueues fit in the picture. Could
> you please check if the stall-dector says something?

Is that the message I repeated above or do you mean the workqueue?

> Is it possible to show me output when the timeout message comes? My
> guess is that the system is going down and before unounting/remount RO
> the filesystem it flushes its last data. But this is done before issuing
> the "halt-syscall".

The entire output from `shutdown -h now' is in the picture; after the
fourth "timing out command" message, I pressed the reset button.

Steve Berman

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ