[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=whNWmVnH_b++g5kjT9jKFNwPcx81hxez=pkrozpXoqVvA@mail.gmail.com>
Date: Tue, 31 Oct 2023 13:06:44 -1000
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: paulmck@...nel.org
Cc: Frederic Weisbecker <frederic@...nel.org>,
linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...nel.org>, rcu@...r.kernel.org,
Boqun Feng <boqun.feng@...il.com>,
Joel Fernandes <joel@...lfernandes.org>,
Neeraj Upadhyay <neeraj.upadhyay@....com>,
Uladzislau Rezki <urezki@...il.com>,
Z qiang <qiang.zhang1211@...il.com>
Subject: Re: [GIT PULL] RCU changes for v6.7
On Tue, 31 Oct 2023 at 03:57, Paul E. McKenney <paulmck@...nel.org> wrote:
>
> Would it help if we make rcu_stall_chain_notifier_register() print a
> suitably obnoxious message saying that future RCU CPU stall warnings
> might be unreliable?
It's not the future stall warnings I worry about.
It's literally things like somebody thinking they are being clever,
registering a rcu stall notifier that prints out extra information in
order to be helpful, and in the process takes a spinlock or something
without thinking about it.
And that spinlock might be the *reason* for the RCU stall in the first place.
So now the RCU stall code prints out NOTHING AT ALL, because now the
stall notifier itself has deadlocked.
This is *exactly* what has happened before with these kinds of
"helpful" exception case notifiers. Because they never trigger in
normal loads, they get basically zero testing - and then when bad
things happen, it turns out that the "helpful" debug code actually
just makes things worse.
Or, if they get testing, they get tested in artificial bad cases (eg
"let's just write a busy loop that hangs for 30 seconds to trigger a
RCU stall"), which doesn't show any of the issues, because they aren't
real bugs with real existing deadlocks.
See what I'm saying? Having notifiers for "sh*t happened" is
fundmanetally questionable to begin with, because they get no testing.
And then calling said notifiers *before* you even have the core
printout for "Look, things are going down hill quickly", now you've
turned a bad situation even worse.
I really think that we should *never* have any kind of notifiers for
kernel bugs. They cause problems. The *one* exception is an actual
honest-to-goodness kernel debugger, and then it should literally
*only* be the debugger that can register a notifier, so that you are
*never* in the situation that a kernel without a debugger will just
hang because of some bogus debug notifier.
Linus
Powered by blists - more mailing lists