[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <78b18304-c6a5-4ea1-a603-8c8f1d79cc1a@paulmck-laptop>
Date: Tue, 31 Oct 2023 06:57:28 -0700
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Frederic Weisbecker <frederic@...nel.org>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...nel.org>, rcu@...r.kernel.org,
Boqun Feng <boqun.feng@...il.com>,
Joel Fernandes <joel@...lfernandes.org>,
Neeraj Upadhyay <neeraj.upadhyay@....com>,
Uladzislau Rezki <urezki@...il.com>,
Z qiang <qiang.zhang1211@...il.com>
Subject: Re: [GIT PULL] RCU changes for v6.7
On Tue, Oct 31, 2023 at 11:19:01AM +0100, Frederic Weisbecker wrote:
> On Mon, Oct 30, 2023 at 06:12:51PM -1000, Linus Torvalds wrote:
> > On Fri, 27 Oct 2023 at 01:33, Frederic Weisbecker <frederic@...nel.org> wrote:
> > >
> > > rcu/stall: Stall detection updates. Introduce RCU CPU Stall notifiers
> > > that allows a subsystem to provide informations to help debugging.
> > > Also cure some false positive stalls.
> >
> > I absolutely detest this stall notifier thing.
> >
> > Putting the stall notifier before the stall message does not "help
> > debugging". Quite the reverse. It ends up being a lovely way to make
> > sure that the debug message is never printed, because there's some
> > entirely untested - and thus buggy - notifier on the chain before the
> > printout from the actual stall code.
> >
> > I've pulled this, but I really want to voice my objection against
> > these kinds of "debugging aids". I have personally spent way too many
> > hours debugging a dead machine because some "debug aid" ended up being
> > untested garbage.
> >
> > If you absolutely think that this is a worthy and useful thing to do,
> > then at the very least make sure that these "debug aids" will always
> > come *after* the core output, and can't make things horrendously
> > worse.
> >
> > But in general, think twice before adding "maybe somebody else wants
> > to print debug info". Because unless you have a really really REALLY
> > good reason for it, it's more likely to hurt than to help.
> >
> > Right now I see no users of this except for the rcu torture code, and
> > it certainly doesn't seem hugely important there. And so I'm wondering
> > what the actual real use-case would be.
>
> I see, one possibility is to revert this and switch to normal calls
> for any future debug information to add from another subsystem. I'll
> wait for Paul's opinion...
The use case thus far is where the RCU CPU stall warning is due to
locks being spun for or held for excessive periods of times, and then
the called code prints out the relevant debug information. In this
particular case, the RCU CPU stall warning message is just added noise.
And if we were to print the RCU CPU stall warning first, we would
likely disturb the locking state, thus rendering the corresponding
debug information useless.
But I completely agree that a poorly planned use of this facility would
have all the problems that Linus has seen in the past.
Would it help if we make rcu_stall_chain_notifier_register() print a
suitably obnoxious message saying that future RCU CPU stall warnings
might be unreliable?
Thanx, Paul
Powered by blists - more mailing lists