[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <de21f62b-c246-4ff7-9825-600fe19af28a@paulmck-laptop>
Date: Sun, 7 Dec 2025 20:57:02 -0800
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Borislav Petkov <bp@...en8.de>
Cc: iommu@...ts.linux.dev, Joerg Roedel <joro@...tes.org>,
Suravee Suthikulpanit <suravee.suthikulpanit@....com>,
Will Deacon <will@...nel.org>, Robin Murphy <robin.murphy@....com>,
linux-kernel@...r.kernel.org
Subject: Re: amd iommu: rcu: INFO: rcu_preempt detected expedited stalls on
CPUs/tasks: { 0-.... } 8 jiffies s: 113 root: 0x1/.
On Thu, Dec 04, 2025 at 12:42:52PM -0800, Paul E. McKenney wrote:
> On Thu, Dec 04, 2025 at 03:45:05PM +0100, Borislav Petkov wrote:
> > On Wed, Dec 03, 2025 at 09:16:37AM -0800, Paul E. McKenney wrote:
> > > Or to some value that works for you. But if you are not looking to be
> > > an expedited RCU CPU stall-warning pioneer, yes, setting it to zero is
> > > a good approach.
> > >
> > > If you would like to be a more sane pioneer, setting it to (say) 11000
> > > (or 11 seconds) could be appropriate. But what fun is sanity? ;-)
> >
> > Oh, I have a lot of excitement even without RCU experiments. :-P
>
> ;-) ;-) ;-)
>
> > But if you need me to try things, lemme know.
> >
> > For now I've simply reset the values as to what defconfig sets them to:
> >
> > CONFIG_RCU_CPU_STALL_TIMEOUT=21
> > CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=0
>
> Sounds good, and thank you!
But this turned out to be too simple to fail to address (famous last
words!). Please see below.
Thanx, Paul
------------------------------------------------------------------------
commit 9776d62e236bef99859e9067f540aa6f6683b432
Author: Paul E. McKenney <paulmck@...nel.org>
Date: Sun Dec 7 20:49:35 2025 -0800
rcu: Make expedited RCU CPU stall warnings detect stall-end races
If an expedited RCU CPU stall ends just at the stall-warning timeout,
the current code will print an expedited stall-warning message, but one
that doesn't identify any CPUs or tasks causing the stall. This is most
likely to happen for short-timeout stalls, for example, the 20-millisecond
timeouts that are sometimes used for small embedded devices. Needless to
say, these semi-empty stall-warning messages can be rather confusing.
One option would be to suppress the stall-warning message entirely in
this case, but the near-miss information can be quite valuable.
This commit therefore detects this race condition and emits a "INFO:
Expedited stall ended before state dump start" message to clarify matters.
Reported-by: Borislav Petkov <bp@...en8.de>
Signed-off-by: Paul E. McKenney <paulmck@...nel.org>
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 6058a734090c1..fd02cd12b7980 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -589,7 +589,12 @@ static void synchronize_rcu_expedited_stall(unsigned long jiffies_start, unsigne
pr_cont(" } %lu jiffies s: %lu root: %#lx/%c\n",
j - jiffies_start, rcu_state.expedited_sequence, data_race(rnp_root->expmask),
".T"[!!data_race(rnp_root->exp_tasks)]);
- if (ndetected) {
+ if (!ndetected) {
+ // This is invoked from the grace-period worker, so
+ // a new grace period cannot have started. And if this
+ // worker were stalled, we would not get here. ;-)
+ pr_err("INFO: Expedited stall ended before state dump start\n");
+ } else {
pr_err("blocking rcu_node structures (internal RCU debug):");
rcu_for_each_node_breadth_first(rnp) {
if (rnp == rnp_root)
Powered by blists - more mailing lists