lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 21 Mar 2023 08:38:50 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     linux-kernel@...r.kernel.org, kernel-team@...a.com,
        rostedt@...dmis.org, jgross@...e.com, mingo@...nel.org,
        corbet@....net
Subject: Re: [PATCH RFC smp] Remove diagnostics and adjust config for CSD
 lock diagnostics

On Tue, Mar 21, 2023 at 11:22:20AM +0100, Peter Zijlstra wrote:
> On Mon, Mar 20, 2023 at 05:54:39PM -0700, Paul E. McKenney wrote:
> > Hello!
> > 
> > This series removes CSD-lock diagnostics that were once very useful
> > but which have not seen much action since.  It also adjusts Kconfig and
> > kernel-boot-parameter setup.
> > 
> > 1.	locking/csd_lock: Add Kconfig option for csd_debug default.
> > 
> > 2.	locking/csd_lock: Remove added data from CSD lock debugging.
> > 
> > 3.	locking/csd_lock: Remove per-CPU data indirection from CSD
> > 	lock debugging.
> > 
> > 4.	kernel/smp: Make csdlock_debug= resettable.
> > 
> > 						Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> >  Documentation/admin-guide/kernel-parameters.txt   |   17 -
> >  b/Documentation/admin-guide/kernel-parameters.txt |    6 
> >  b/kernel/smp.c                                    |    2 
> >  b/lib/Kconfig.debug                               |    9 
> >  kernel/smp.c                                      |  260 ++--------------------
> >  5 files changed, 47 insertions(+), 247 deletions(-)
> 
> Yay!! How do you want to route these, should I take them through tip?

Either way works for me.  If you take them into -tip, I will drop them
from -rcu.  If you don't take them into -tip, I will send Linus a pull
request for the upcoming merge window.  And if you take them at just
the wrong time, we will both send them to Linus.  ;-)

Your choice!

> What about the rest of the thing? Your commits seem to suggest it's
> still actually used -- why? Are there still more virt bugs?

Thus far, no luck.  I proposed ditching some of the stack traces, but
that got shot down.

These find the following issues:  (1) CPU looping with interrupts
disabled.  (2) CPU stuck in a longer-than-average SMI handler or other
firmware sand trap.  (3) CPU fail stopped.

In theory, we could drop the RCU CPU stall warning to five seconds and
catch this same stuff.  Unfortunately, in practice, there would need to
be lots of churn from CPUs looping with preemption disabled.  Which we
still get from time to time even at 21 seconds.

NMIs can be used to deal with #1, and the hard lockup detector in fact
sort of does this.  But these are not helpful for #2 and #3.

So nothing yet, but I am still looking for improved diagnostics.

							Thanx, Paul

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ