linux-kernel - Re: [PATCH v3 rcu 3/3] rcu: Finer-grained grace-period-end checks in rcu_dump_cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAD=FV=WsM=om=xobfWkC8nLQNK89xdyNNX=L1=DjD79ZPXYxRg@mail.gmail.com>
Date: Wed, 30 Oct 2024 17:21:13 -0700
From: Doug Anderson <dianders@...omium.org>
To: paulmck@...nel.org
Cc: Cheng-Jui Wang (王正睿) <Cheng-Jui.Wang@...iatek.com>, 
	"sumit.garg@...aro.org" <sumit.garg@...aro.org>, 
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "rostedt@...dmis.org" <rostedt@...dmis.org>, 
	"frederic@...nel.org" <frederic@...nel.org>, wsd_upstream <wsd_upstream@...iatek.com>, 
	Bobule Chang (張弘義) <bobule.chang@...iatek.com>, 
	"mark.rutland@....com" <mark.rutland@....com>, "kernel-team@...a.com" <kernel-team@...a.com>, 
	"joel@...lfernandes.org" <joel@...lfernandes.org>, "rcu@...r.kernel.org" <rcu@...r.kernel.org>
Subject: Re: [PATCH v3 rcu 3/3] rcu: Finer-grained grace-period-end checks in rcu_dump_cpu_stacks()

Hi,

On Wed, Oct 30, 2024 at 4:27 PM Paul E. McKenney <paulmck@...nel.org> wrote:
>
> On Wed, Oct 30, 2024 at 01:12:01PM -0700, Doug Anderson wrote:
> > Hi,
> >
> > On Wed, Oct 30, 2024 at 6:54 AM Paul E. McKenney <paulmck@...nel.org> wrote:
> > >
> > > > > We do assume that nmi_trigger_cpumask_backtrace() uses true NMIs, so,
> > > > > yes, nmi_trigger_cpumask_backtrace() should use true NMIs, just like
> > > > > the name says.  ;-)
> > > >
> > > > In the comments of following patch, the arm64 maintainers have
> > > > differing views on the use of nmi_trigger_cpumask_backtrace(). I'm a
> > > > bit confused and unsure which perspective is more reasonable.
> > > >
> > > > https://lore.kernel.org/all/20230906090246.v13.4.Ie6c132b96ebbbcddbf6954b9469ed40a6960343c@changeid/
> > >
> > > I clearly need to have a chat with the arm64 maintainers, and thank
> > > you for checking.
> > >
> > > > > /*
> > > > >  * NOTE: though nmi_trigger_cpumask_backtrace() has "nmi_" in the
> > > > name,
> > > > >  * nothing about it truly needs to be implemented using an NMI, it's
> > > > >  * just that it's _allowed_ to work with NMIs. If ipi_should_be_nmi()
> > > > >  * returned false our backtrace attempt will just use a regular IPI.
> > > > >  */
> > > >
> > > > > Alternatively, arm64 could continue using nmi_trigger_cpumask_backtrace()
> > > > > with normal interrupts (for example, on SoCs not implementing true NMIs),
> > > > > but have a short timeout (maybe a few jiffies?) after which its returns
> > > > > false (and presumably also cancels the backtrace request so that when
> > > > > the non-NMI interrupt eventually does happen, its handler simply returns
> > > > > without backtracing).  This should be implemented using atomics to avoid
> > > > > deadlock issues.  This alternative approach would provide accurate arm64
> > > > > backtraces in the common case where interrupts are enabled, but allow
> > > > > a graceful fallback to remote tracing otherwise.
> > > > >
> > > > > Would you be interested in working this issue, whatever solution the
> > > > > arm64 maintainers end up preferring?
> > > >
> > > > The 10-second timeout is hard-coded in nmi_trigger_cpumask_backtrace().
> > > > It is shared code and not architecture-specific. Currently, I haven't
> > > > thought of a feasible solution. I have also CC'd the authors of the
> > > > aforementioned patch to see if they have any other ideas.
> > >
> > > It should be possible for arm64 to have an architecture-specific hook
> > > that enables them to use a much shorter timeout.  Or, to eventually
> > > switch to real NMIs.
> >
> > Note that:
> >
> > * Switching to real NMIs is impossible on many existing arm64 CPUs.
> > The hardware support simply isn't there. Pseudo-NMIs should work fine
> > and are (in nearly all cases) just as good as NMIs but they have a
> > small performance impact. There are also compatibility issues with
> > some pre-existing bootloaders. ...so code can't assume even Pseudo-NMI
> > will work and needs to be able to fall back. Prior to my recent
> > changes arm64 CPUs wouldn't even do stacktraces in some scenarios. Now
> > at least they fall back to regular IPIs.
>
> But those IPIs are of no help whatsoever when the target CPU is spinning
> with interrupts disabled, which is the scenario under discussion.

Right that we can't trace the target CPU spinning with interrupts
disabled. ...but in the case where NMI isn't available it _still_
makes sense to make arch_trigger_cpumask_backtrace() fall back to IPI
because:

1. There are many cases where the trigger.*cpu.*backtrace() class of
functions are used to trace CPUs that _aren't_ looping with interrupts
disabled. We still want to be able to backtrace in that case. For
instance, you can see in "panic.c" that there are cases where
trigger_all_cpu_backtrace() is called. It's valuable to make cases
like this (where there's no indication that a CPU is locked) actually
work.

2. Even if there's a CPU that's looping with interrupts disabled and
we can't trace it because of no NMI, it could still be useful to get
backtraces of other CPUs. It's possible that what the other CPUs are
doing could give a hint about the CPU that's wedged. This isn't a
great hint, but some info is better than no info.


> Hence my suggestion that arm64, when using IRQs to request stack
> backtraces, have an additional short timeout (milliseconds) in
> order to fall back to remote stack tracing.  In many cases, this
> fallback happens automatically, as you can see in dump_cpu_task().
> If trigger_single_cpu_backtrace() returns false, the stack is dumped
> remotely.

I think the answer here is that the API needs to change. The whole
boolean return value for trigger.*cpu.*backtrace() is mostly ignored
by callers. Yes, you've pointed at one of the two places that it's not
ignored and it tries a reasonable fallback, but all the other callers
just silently do nothing when the function returns false. That led to
many places where arm64 devices were simply not getting _any_
stacktrace.

Perhaps we need some type of API that says "I actually have a
fallback, so if you don't think the stracktrace will succeed then it's
OK to return false".


> > * Even if we decided that we absolutely had to disable stacktrades on
> > arm64 CPUs without some type of NMI, that won't fix arm32. arm32 has
> > been using regular IPIs for backtraces like this for many, many years.
> > Maybe folks don't care as much about arm32 anymore but it feels bad if
> > we totally break it.
>
> Because arm32 isn't used much for datacenter workloads, it will not
> be suffering from this issue.  Not enough to have motivated anyone to
> fix it, anyway.  In contrast, in the datacenter, CPUs really can and
> do have interrupts disabled for many seconds.  (No, this is not a good
> thing, but it is common enough for us to need to avoid disabling other
> debugging facilities.)
>
> So it might well be that arm32 will continue to do just fine with
> IRQ-based stack tracing.  Or maybe someday arm32 will also need to deal
> with this same issue.  But no "maybe" for arm64, given its increasing
> use in the datacenter.

IMO the answer here is that anyone in datacenters should just turn on
pseudo NMI (or NMI on newer arm64 CPUs).


> > IMO waiting 10 seconds for a backtrace is pretty crazy to begin with.
> > What about just changing that globally to 1 second? If not, it doesn't
> > feel like it would be impossibly hard to make an arch-specific
> > callback to choose the time and that callback could even take into
> > account whether we managed to get an NMI. I'd be happy to review such
> > a patch.
>
> Let's please keep the 10-second timeout for NMI-based backtraces.
>
> But I take it that you are not opposed to a shorter timeout for the
> special case of IRQ-based stack backtrace requests?

IMO the 10 second is still awfully long (HW watchdogs can trigger
during that time), but I'm not that upset by it. I'm OK with shorter
timeouts for IRQ-based ones, though I'm not sure I'd be OK w/ timeouts
measured in milliseconds unless the caller has actually said that they
can handle a "false" return or the caller says that it's more
important to be quick than to wait for a long time.


-Doug