lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <734c6c65-b9f4-40a9-b6a8-abcb08ee83f5@paulmck-laptop>
Date: Wed, 21 May 2025 08:06:23 -0700
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Jens Axboe <axboe@...nel.dk>
Cc: John Ogness <john.ogness@...utronix.de>,
	LKML <linux-kernel@...r.kernel.org>, Petr Mladek <pmladek@...e.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	"senozhatsky@...omium.org" <senozhatsky@...omium.org>
Subject: Re: printk NMI splat on boot

On Wed, May 21, 2025 at 07:05:09AM -0600, Jens Axboe wrote:
> On 5/21/25 12:06 AM, Paul E. McKenney wrote:
> > On Tue, May 20, 2025 at 02:41:40PM -0600, Jens Axboe wrote:
> >> On 5/20/25 2:18 PM, Jens Axboe wrote:
> >>>> What values are you using for CONFIG_RCU_EXP_CPU_STALL_TIMEOUT and
> >>>> CONFIG_RCU_CPU_STALL_TIMEOUT?
> >>>
> >>> CONFIG_RCU_CPU_STALL_TIMEOUT=21
> >>> CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=2
> >>
> >> This was =20 btw, guess it could cut a bit too much...
> > 
> > Just confirming that setting CONFIG_RCU_EXP_CPU_STALL_TIMEOUT to two
> > milliseconds is more than a bit on the aggressive side.  ;-)
> 
> Sorry guess I wasn't clear - I had pasted in =2, but the setting in my
> config was =20.

Ah, got it!  Less aggressive, but not recommended for other than small
devices, such as Android smartphones.

> > Setting it to 20 milliseconds is OK for smartphone-class devices, but
> > to the best of my knowledge, setting it less than 21 seconds (as in
> > 21,000 milliseconds) has not been tested on any other platform.
> > 
> >> Changed them to:
> >>
> >> CONFIG_RCU_CPU_STALL_TIMEOUT=100
> >> CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=0
> >>
> >> and complaining is gone.
> > 
> > This makes it take the default, which in this case would be the specified
> > CONFIG_RCU_CPU_STALL_TIMEOUT value of 100 seconds.  Which is an unusually
> > long timeout -- mainline these days is 21 seconds and some distros still
> > use the old value of 60 seconds.
> 
> IMHO the settings for these are very odd. Which I guess is fine for
> debugging kind of infrastructure, but fairly nonsensical in any case.
> But not really that important - looks like RCU_EXP_CPU_STALL_TIMEOUT has
> a default of '0' so not sure how on earth I ended up with 20 in that
> one. Most likely from not reading the help entry and hence setting it
> similarly to RCU_CPU_STALL_TIMEOUT.

Agreed, it would be better if both had the same units.  But back 20 years
ago, milliseconds would have seemed insane for RCU_CPU_STALL_TIMEOUT,
in fact, many would have argued that the current setting of "only"
21 seconds would be insane.  And then a few years ago, people really
needed milliseconds for RCU_EXP_CPU_STALL_TIMEOUT, so here we are...

Me, I should have seen it coming.  After all, the equivalent to
RCU_CPU_STALL_TIMEOUT in DYNIX/ptx was 1.5 seconds.  But, again, here
we are...

							Thanx, Paul

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ