[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c83bea5c-b577-4b6a-bf41-6ac7433f15bc@paulmck-laptop>
Date: Wed, 1 Nov 2023 10:40:14 -0700
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Frederic Weisbecker <frederic@...nel.org>,
linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...nel.org>, rcu@...r.kernel.org,
Boqun Feng <boqun.feng@...il.com>,
Joel Fernandes <joel@...lfernandes.org>,
Neeraj Upadhyay <neeraj.upadhyay@....com>,
Uladzislau Rezki <urezki@...il.com>,
Z qiang <qiang.zhang1211@...il.com>
Subject: Re: [GIT PULL] RCU changes for v6.7
On Wed, Nov 01, 2023 at 07:11:54AM -1000, Linus Torvalds wrote:
> On Tue, 31 Oct 2023 at 15:08, Paul E. McKenney <paulmck@...nel.org> wrote:
> >
> > Here are the ways forward I can see:
> >
> > 1. Status quo. This has all the issues that you call out.
> > People will hurt themselves with it and consume time and effort.
> > So let's not do this.
>
> Well, at a *minimum*, I really want that notifier chain call to be
> done *after* the core printk's.
>
> That way, if it deadlocks or does something else stupid, at least the
> core printouts make it out.
>
> IOW, I think the notifier should be done perhaps just before the
> "panic_on_rcu_stall()" call, not at the top before you've even
> reported any stall conditions at all.
Understood. But my problem is that the core printk()s destroy the state
that the notifier is trying to output.
> And yes, I think the trace_rcu_stall_warning() might be better off
> later too, but at least trace events are things that get regular
> testing in nasty conditions (including NMI etc), so I'm *much* less
> worried about those than about "random developers who think they know
> what they do and add a notifier".
Agreed, this is a special debug facility, not something that anyone
should use in production. And also not something that should be used
where gdb would do the job.
> And yes, I do think the notifier should be narrowed down a lot, if you
> actually want to keep it.
Understood, thus a new default-disabled Kconfig option that depends on
RCU_EXPERT and DEBUG_KERNEL, along with a default-disabled kernel
boot parameter, both of which have to be selected to make anything
happen.
> I did not actually hear you say that there is a good use-case for it.
> I only saw you say "Those of us who need this", without showing *any*
> kind of indication of why anybody would use it in reality.
>
> Why the secrecy? There is certainly no current user, nor any
> description of what a user would be and what makes that notifier
> useful.
>
> The commit message also just says "It is sometimes helpful" and some
> strange reference to "the subsystem causing the stall to dump its
> state". It all sounds very fishy. Why would anybody ever have a known
> subsystem causing RCU stalls? Except, of course, for the rcutorture
> testing.
One use case is dumping out the qspinlock state for an extremely
rare lockup. If you even look at the system cross-eyed, the lockup
goes away. And yes, I should have mentioned this in the commit
log, and I apologize for having failed to do so. I do not expect
that the state-dump code would ever be appropriate for mainline.
> Anyway, that all absolutely SCREAMS to me "this is not something
> useful in any normal kernel", and so yes:
Agreed, definitely not for any normal kernel!
> > 3. Add a default-n Kconfig option that depends on RCU_EXPERT
> > and KEBUG_KERNEL, so that these problems can only arise in
> > specially built kernels.
> >
> > 4. Same as #3, but use a kernel boot parameter instead of a
> > Kconfig option.
>
> let's make it clear that this is *not* something that any upstream
> kernel would ever do, and the *only* possible use for it is some kind
> of external temporary debug patch.
>
> See why I so hate things like this? Let's head off any crazy use long
> *long* before somebody decides that "Oh, I want to use this".
You are absolutely right, a debug tool with this many sharp edges should
definitely not be default-enabled. And needs some scary words in the
Kconfig help text. And a boot-time splat to make people think twice
before using it.
Apologies for not having thought this through!
I will send a fixup patch before the end of today.
Thanx, Paul
Powered by blists - more mailing lists