[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0409e716-5bd5-4501-9a90-3a4aed048c7f@paulmck-laptop>
Date: Thu, 7 Mar 2024 09:49:30 -0800
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Stefan Wiehler <stefan.wiehler@...ia.com>
Cc: Russell King <linux@...linux.org.uk>,
Joel Fernandes <joel@...lfernandes.org>,
Josh Triplett <josh@...htriplett.org>,
Boqun Feng <boqun.feng@...il.com>,
Steven Rostedt <rostedt@...dmis.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Lai Jiangshan <jiangshanlai@...il.com>,
Zqiang <qiang.zhang1211@...il.com>,
linux-arm-kernel@...ts.infradead.org, rcu@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] arm: smp: Avoid false positive CPU hotplug Lockdep-RCU
splat
On Thu, Mar 07, 2024 at 09:45:36AM -0800, Paul E. McKenney wrote:
> On Thu, Mar 07, 2024 at 05:09:51PM +0100, Stefan Wiehler wrote:
> > With CONFIG_PROVE_RCU_LIST=y and by executing
> >
> > $ echo 0 > /sys/devices/system/cpu/cpu1/online
> >
> > one can trigger the following Lockdep-RCU splat on ARM:
> >
> > =============================
> > WARNING: suspicious RCU usage
> > 6.8.0-rc7-00001-g0db1d0ed8958 #10 Not tainted
> > -----------------------------
> > kernel/locking/lockdep.c:3762 RCU-list traversed in non-reader section!!
> >
> > other info that might help us debug this:
> >
> > RCU used illegally from offline CPU!
> > rcu_scheduler_active = 2, debug_locks = 1
> > no locks held by swapper/1/0.
> >
> > stack backtrace:
> > CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.8.0-rc7-00001-g0db1d0ed8958 #10
> > Hardware name: Allwinner sun8i Family
> > unwind_backtrace from show_stack+0x10/0x14
> > show_stack from dump_stack_lvl+0x60/0x90
> > dump_stack_lvl from lockdep_rcu_suspicious+0x150/0x1a0
> > lockdep_rcu_suspicious from __lock_acquire+0x11fc/0x29f8
> > __lock_acquire from lock_acquire+0x10c/0x348
> > lock_acquire from _raw_spin_lock_irqsave+0x50/0x6c
> > _raw_spin_lock_irqsave from check_and_switch_context+0x7c/0x4a8
> > check_and_switch_context from arch_cpu_idle_dead+0x10/0x7c
> > arch_cpu_idle_dead from do_idle+0xbc/0x138
> > do_idle from cpu_startup_entry+0x28/0x2c
> > cpu_startup_entry from secondary_start_kernel+0x11c/0x124
> > secondary_start_kernel from 0x401018a0
> >
> > The CPU is already reported as offline from RCU perspective in
> > cpuhp_report_idle_dead() before arch_cpu_idle_dead() is invoked. Above
> > RCU-Lockdep splat is then triggered by check_and_switch_context() acquiring the
> > ASID spinlock.
> >
> > Avoid the false-positive Lockdep-RCU splat by briefly reporting the CPU as
> > online again while the spinlock is held.
> >
> > Signed-off-by: Stefan Wiehler <stefan.wiehler@...ia.com>
>
> From an RCU perspective, this looks plausible. One question
> below.
But one additional caution... If execution is delayed during that call
to idle_task_exit(), RCU will stall and won't have a reasonable way of
motivating this CPU. Such delays could be due to vCPU preemption or
due to firmware grabbing the CPU.
But this is only a caution, not opposition. After all, you could have
the same problem with an online CPU that gets similarly delayed while
its interrupts are disabled.
Thanx, Paul
> > ---
> > arch/arm/kernel/smp.c | 7 +++++++
> > 1 file changed, 7 insertions(+)
> >
> > diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
> > index 3431c0553f45..6875e2c5dd50 100644
> > --- a/arch/arm/kernel/smp.c
> > +++ b/arch/arm/kernel/smp.c
> > @@ -319,7 +319,14 @@ void __noreturn arch_cpu_idle_dead(void)
> > {
> > unsigned int cpu = smp_processor_id();
> >
> > + /*
> > + * Briefly report CPU as online again to avoid false positive
> > + * Lockdep-RCU splat when check_and_switch_context() acquires ASID
> > + * spinlock.
> > + */
> > + rcutree_report_cpu_starting(cpu);
> > idle_task_exit();
> > + rcutree_report_cpu_dead();
> >
> > local_irq_disable();
>
> Both rcutree_report_cpu_starting() and rcutree_report_cpu_dead() complain
> bitterly via lockdep if interrupts are enabled. And the call sites have
> interrupts disabled. So I don't understand what this local_irq_disable()
> is needed for.
Powered by blists - more mailing lists