lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Date:   Wed, 21 Mar 2018 16:28:51 +0900
From:   Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
To:     bugzilla-daemon@...zilla.kernel.org
Cc:     sergey.senozhatsky@...il.com, Steven Rostedt <rostedt@...dmis.org>,
        Petr Mladek <pmladek@...e.com>, linux-kernel@...r.kernel.org,
        Peter Zijlstra <peterz@...radead.org>,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [Bug 199003] console stalled, cause Hard LOCKUP.

On (03/20/18 09:34), bugzilla-daemon@...zilla.kernel.org wrote:
[..]
> Thanks very much.
> commit e480af09c49736848f749a43dff2c902104f6691 avoided the NMI watchdog
> trigger.

Hm, okay... But "touch_nmi_watchdog() everywhere printk/console-related"
is not exactly where I wanted us to be.

By the way e480af09c49736848f749a43dff2c902104f6691 is from 2006.
Are you sure you meant exactly that commit? What kernel do you use?


Are you saying that none of Steven's patches helped on your setups?


> And this patch may  avdoid long time blocking:
> https://lkml.org/lkml/2018/3/8/584
> 
> We've test it several days.

Hm, printk_deferred is a bit dangerous; it moves console_unlock() to
IRQ. So you still can have the problem of stuck CPUs, it's just now
you shut up the watchdog. Did you test Steven's patches?


A tricky part about printk_deferred() is that it does not use hand off
mechanism. And even more...  What we have with "printk vs printk"
sceanrio

	CPU0			CPU1		...		CPUN

	printk			printk
	 console_unlock 	 hand off			printk
	 			  console_unlock		 hand off
								  console_unlock

turns into a good old "one CPU prints it all" when we have "printk vs
printk_deferred" case. Because printk_deferred just log_store messages
and then _may be_ it grabs the console_sem from IRQ and invokes
console_unlock().

So it's something like this

	CPU0			CPU1		...		CPUN

	printk			printk_deffered
	 console_unlock						printk_deferred
	 console_unlock
	 console_unlock
	...			...				...
				printk_deffered			printk_deferred
	 console_unlock
	 console_unlock


// offtopic  "I can has printk_kthread?"



You now touch_nmi_watchdog() from the console driver [well... at least this
is what e480af09c4973 is doing, but I'm not sure I see how come you didn't
have it applied], so that's why you don't see hard lockups on that CPU0. But
your printing CPU still can stuck, which will defer RCUs on that CPU, etc.
etc. etc. So I'd say that those two approaches

		printk_deferred + touch_nmi_watchdog

combined can do quite some harm. One thing for sure - they don't really fix
any problems.

	-ss

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ