[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAD=FV=WBcOCGbD0haRYLGgAFpDhfoqMW8mvj9DEA0CSPHG3Owg@mail.gmail.com>
Date: Wed, 30 Oct 2024 13:12:01 -0700
From: Doug Anderson <dianders@...omium.org>
To: paulmck@...nel.org
Cc: Cheng-Jui Wang (王正睿) <Cheng-Jui.Wang@...iatek.com>,
"sumit.garg@...aro.org" <sumit.garg@...aro.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "rostedt@...dmis.org" <rostedt@...dmis.org>,
"frederic@...nel.org" <frederic@...nel.org>, wsd_upstream <wsd_upstream@...iatek.com>,
Bobule Chang (張弘義) <bobule.chang@...iatek.com>,
"mark.rutland@....com" <mark.rutland@....com>, "kernel-team@...a.com" <kernel-team@...a.com>,
"joel@...lfernandes.org" <joel@...lfernandes.org>, "rcu@...r.kernel.org" <rcu@...r.kernel.org>
Subject: Re: [PATCH v3 rcu 3/3] rcu: Finer-grained grace-period-end checks in rcu_dump_cpu_stacks()
Hi,
On Wed, Oct 30, 2024 at 6:54 AM Paul E. McKenney <paulmck@...nel.org> wrote:
>
> > > We do assume that nmi_trigger_cpumask_backtrace() uses true NMIs, so,
> > > yes, nmi_trigger_cpumask_backtrace() should use true NMIs, just like
> > > the name says. ;-)
> >
> > In the comments of following patch, the arm64 maintainers have
> > differing views on the use of nmi_trigger_cpumask_backtrace(). I'm a
> > bit confused and unsure which perspective is more reasonable.
> >
> > https://lore.kernel.org/all/20230906090246.v13.4.Ie6c132b96ebbbcddbf6954b9469ed40a6960343c@changeid/
>
> I clearly need to have a chat with the arm64 maintainers, and thank
> you for checking.
>
> > > /*
> > > * NOTE: though nmi_trigger_cpumask_backtrace() has "nmi_" in the
> > name,
> > > * nothing about it truly needs to be implemented using an NMI, it's
> > > * just that it's _allowed_ to work with NMIs. If ipi_should_be_nmi()
> > > * returned false our backtrace attempt will just use a regular IPI.
> > > */
> >
> > > Alternatively, arm64 could continue using nmi_trigger_cpumask_backtrace()
> > > with normal interrupts (for example, on SoCs not implementing true NMIs),
> > > but have a short timeout (maybe a few jiffies?) after which its returns
> > > false (and presumably also cancels the backtrace request so that when
> > > the non-NMI interrupt eventually does happen, its handler simply returns
> > > without backtracing). This should be implemented using atomics to avoid
> > > deadlock issues. This alternative approach would provide accurate arm64
> > > backtraces in the common case where interrupts are enabled, but allow
> > > a graceful fallback to remote tracing otherwise.
> > >
> > > Would you be interested in working this issue, whatever solution the
> > > arm64 maintainers end up preferring?
> >
> > The 10-second timeout is hard-coded in nmi_trigger_cpumask_backtrace().
> > It is shared code and not architecture-specific. Currently, I haven't
> > thought of a feasible solution. I have also CC'd the authors of the
> > aforementioned patch to see if they have any other ideas.
>
> It should be possible for arm64 to have an architecture-specific hook
> that enables them to use a much shorter timeout. Or, to eventually
> switch to real NMIs.
Note that:
* Switching to real NMIs is impossible on many existing arm64 CPUs.
The hardware support simply isn't there. Pseudo-NMIs should work fine
and are (in nearly all cases) just as good as NMIs but they have a
small performance impact. There are also compatibility issues with
some pre-existing bootloaders. ...so code can't assume even Pseudo-NMI
will work and needs to be able to fall back. Prior to my recent
changes arm64 CPUs wouldn't even do stacktraces in some scenarios. Now
at least they fall back to regular IPIs.
* Even if we decided that we absolutely had to disable stacktrades on
arm64 CPUs without some type of NMI, that won't fix arm32. arm32 has
been using regular IPIs for backtraces like this for many, many years.
Maybe folks don't care as much about arm32 anymore but it feels bad if
we totally break it.
IMO waiting 10 seconds for a backtrace is pretty crazy to begin with.
What about just changing that globally to 1 second? If not, it doesn't
feel like it would be impossibly hard to make an arch-specific
callback to choose the time and that callback could even take into
account whether we managed to get an NMI. I'd be happy to review such
a patch.
-Doug
Powered by blists - more mailing lists