lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180322021437.GA3181@jagdpanzerIV>
Date:   Thu, 22 Mar 2018 11:14:37 +0900
From:   Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
To:     Steven Rostedt <rostedt@...dmis.org>
Cc:     bugzilla-daemon@...zilla.kernel.org,
        LKML <linux-kernel@...r.kernel.org>, wen.yang99@....com.cn,
        Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
        Petr Mladek <pmladek@...e.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Christoph Hellwig <hch@...radead.org>
Subject: Re: [Bug 199003] console stalled, cause Hard LOCKUP.

Hello Steven,

On (03/21/18 09:44), Steven Rostedt wrote:
[..]
> >  #4 [ffff881ffb0b7540] delay_tsc at ffffffff81326724
> >  #5 [ffff881ffb0b7548] __const_udelay at ffffffff81326678
> >  #6 [ffff881ffb0b7558] wait_for_xmitr at ffffffff814056e0
> >  #7 [ffff881ffb0b7580] serial8250_console_putchar at ffffffff814058ac
> >  #8 [ffff881ffb0b75a0] uart_console_write at ffffffff8140035a
> >  #9 [ffff881ffb0b75d0] serial8250_console_write at ffffffff814057fe
> > #10 [ffff881ffb0b7618] call_console_drivers.constprop.17 at ffffffff81087011
> > #11 [ffff881ffb0b7640] console_unlock at ffffffff810889e9
> > #12 [ffff881ffb0b7680] vprintk_emit at ffffffff81088df4
> > #13 [ffff881ffb0b76f0] dev_vprintk_emit at ffffffff81428e72
> > #14 [ffff881ffb0b77a8] dev_printk_emit at ffffffff81428eee
> > #15 [ffff881ffb0b7808] __dev_printk at ffffffff8142937e
> > #16 [ffff881ffb0b7818] dev_printk at ffffffff8142942d
> > #17 [ffff881ffb0b7888] sdev_prefix_printk at ffffffff81463771
> > #18 [ffff881ffb0b7918] scsi_prep_state_check at ffffffff814598e4
> > #19 [ffff881ffb0b7928] scsi_prep_fn at ffffffff8145992d
> > #20 [ffff881ffb0b7960] blk_peek_request at ffffffff812f0826
> > #21 [ffff881ffb0b7988] scsi_request_fn at ffffffff8145b588
> > #22 [ffff881ffb0b79f0] __blk_run_queue at ffffffff812ebd63
> > #23 [ffff881ffb0b7a08] blk_queue_bio at ffffffff812f1013        -----》acquired q->queue_lock and wait for console_write to finish
> 
> What do you mean by "wait for console_write to finish" here?

*My guess*

I suppose a typical "printk->console_unlock() under spin_lock" thing

	CPU0						CPU1
	spin_lock(&lock)
	 printk						spin_lock(&lock)
	  console_unlock()
	   for (;;)
	    call_console_drivers()

While other CPUs printk->log_store() and some (or may be just one) is locked
up on spin_lock(&lock). So we have several lockup vectors here - one is
console_unlock() under spin_lock and the other one is CPUs spinning on the
very same spin_lock. So handoff should help here, no doubt. But it cannot help
if other CPUs will start to printk_deferred() instead of printk().


Looking at
 printk()->call_console_drivers()->serial8250_console_putchar()->wait_for_xmitr()

... wait_for_xmitr() can spin for over 1 second waiting for the UART_MSR_CTS
bit.

static void wait_for_xmitr(struct uart_8250_port *up, int bits)
{
	unsigned int status, tmout = 10000;

	/* Wait up to 10ms for the character(s) to be sent. */
	for (;;) {
		status = serial_in(up, UART_LSR);

		up->lsr_saved_flags |= status & LSR_SAVE_FLAGS;

		if ((status & bits) == bits)
			break;
		if (--tmout == 0)
			break;
		udelay(1);
		touch_nmi_watchdog();
	}

	/* Wait up to 1s for flow control if necessary */
	if (up->port.flags & UPF_CONS_FLOW) {
		for (tmout = 1000000; tmout; tmout--) {
			unsigned int msr = serial_in(up, UART_MSR);
			up->msr_saved_flags |= msr & MSR_SAVE_FLAGS;
			if (msr & UART_MSR_CTS)
				break;
			udelay(1);
			touch_nmi_watchdog();
		}
	}
	...
}

a 1+ second long busy loop in the console driver is quite close to
"problems guaranteed". But, wait, there is even more. This wait_for_xmitr()
busy wait is happening after every character we print on the console. So
printk("foo") will generate 5 * wait_for_xmitr() busy loops [foo + \r + \n].
They punch&touch watchdog a lot, so at the least the system won't get killed
by the hardlockup detector. But at the same time, it's still potentially a
1+ second busy loop in the console driver * strlen(message).

Sometimes I really wish we had detached consoles. Direct printk()->console
is nice and cool, but... we can't have it. I don't want to pressure for
printk_kthread, but look at all those consoles. There are not so many ways
out. What do you think?

	-ss

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ