linux-kernel - Re: [Bug 199003] console stalled, cause Hard LOCKUP.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Fri, 23 Mar 2018 21:06:22 +0900
From:   Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
To:     Steven Rostedt <rostedt@...dmis.org>
Cc:     Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
        bugzilla-daemon@...zilla.kernel.org,
        LKML <linux-kernel@...r.kernel.org>, wen.yang99@....com.cn,
        Petr Mladek <pmladek@...e.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Christoph Hellwig <hch@...radead.org>
Subject: Re: [Bug 199003] console stalled, cause Hard LOCKUP.

Hello Steven,

On (03/22/18 18:25), Steven Rostedt wrote:
> > static void wait_for_xmitr(struct uart_8250_port *up, int bits)
> > {
> > 	unsigned int status, tmout = 10000;
> > 
> > 	/* Wait up to 10ms for the character(s) to be sent. */
> > 	for (;;) {
			....
> > 		if (--tmout == 0)
> > 			break;
> > 		udelay(1);
> > 		touch_nmi_watchdog();
> > 	}
> > 
> > 	/* Wait up to 1s for flow control if necessary */
> > 	if (up->port.flags & UPF_CONS_FLOW) {
> > 		for (tmout = 1000000; tmout; tmout--) {
			....
> > 			udelay(1);
> > 			touch_nmi_watchdog();
> > 		}
> > 	}
> > 	...
> > }
> > 
> > a 1+ second long busy loop in the console driver is quite close to
> 
> Yeah that's nasty but shouldn't ever hit 1 second unless there's
> hardware issues.

Would probably agree. But even without that 1 second UPF_CONS_FLOW loop
(it seems that there are not so many consoles that have that flag set),
we have a 10ms [10000 * udelay(1)] loop after every character. So if we
print a 512-bytes kernel message [1/2 of a max printk message length]
than its 0.01 * (512 + \r + \n) seconds. Which is a lot, unless I
miscalculated 10000 * udelay(1). Delays in the first loop are normal,
probably not always 10ms, tho, but still.

[..]
> > Sometimes I really wish we had detached consoles. Direct printk()->console
> > is nice and cool, but... we can't have it. I don't want to pressure for
> > printk_kthread, but look at all those consoles. There are not so many ways
> > out. What do you think?
> 
> If anything, perhaps we could have a printk thread that is triggered to
> wake up if any print is taking a long time, or if there's a lot of
> prints happening. And then it could try to print anything in the
> buffer. Just the mere printk() in the thread should do the hand off.

OK. Let's think more about this. I have that "wake up printk_kthread if
any print is taking a long time, or if there's a lot of prints happening"
patch set: a watchdog threshold based solution. I didn't update it for a
while, but can take a look. Detaching printouts does look to me like
something we need to do.

Offloading does not work well enough with hand off (in its current
form). Because the CPU that we offloaded printing from can re-take
console_sem the very next time it calls printk().

Something like

       spin_lock(&lock)
	printk() -> 8250
	 console_unlock()
	 offload		printk_kthread
	printk() -> 8250	 console_unlock()
	printk() -> 8250	  hand off
	 console_unlock()

so if printk() -> 8250 is very slow and is under some spin_lock
(like in case of Bug 199003) then offloading doesn't make that much
difference here. Right? If we don't hand off from printk_kthread
then things do look different.

> I wonder how bad it would be to wake the printk thread whenever a
> printk() is executed. Printks really shouldn't be that common.

That didn't work quite well for me. But I must admit that my previous
printk_kthread version had some major design problems - preemption in
printk path [including printk_kthread preemption], the lack of any
reliable way for any CPU to control what printk_kthread is doing, and
so on and on. Current version, I guess, is better. But I don't do
"vprintk_emit() -> wake_up(printk_kthread)", instead I let "direct"
printks and wake_up(printk_kthread) only when current CPU spent too
much time in call_console_drivers(); it's quite easy to track given
that printk->console_unlock() is preempt_disable() now. Besides,
offloading from console_unlock() automatically handles various cases
for use. Like printk_deferred() -> IRQ -> console_unlock() and so on.

What do you think?

	-ss