[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <861qhoyk3x.wl-maz@kernel.org>
Date: Mon, 03 Jul 2023 18:34:42 +0100
From: Marc Zyngier <maz@...nel.org>
To: Conor Dooley <conor@...nel.org>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Anup Patel <apatel@...tanamicro.com>,
Palmer Dabbelt <palmer@...osinc.com>,
lkp <oliver.sang@...el.com>,
"Liam R. Howlett" <Liam.Howlett@...cle.com>,
LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org,
lkp@...el.com
Subject: Re: [mm] 408579cd62: WARNING:suspicious_RCU_usage
On Mon, 03 Jul 2023 18:20:52 +0100,
Conor Dooley <conor@...nel.org> wrote:
>
> [1 <text/plain; us-ascii (quoted-printable)>]
> On Mon, Jul 03, 2023 at 10:07:28AM -0700, Linus Torvalds wrote:
> > On Mon, 3 Jul 2023 at 10:00, Conor Dooley <conor@...nel.org> wrote:
> > >
> > > I'm not entirely sure if it is related, as stuff in the guts of mm like
> > > this is beyond me, but I've been seeing similar warnings on RISC-V.
> >
> > No, that RISC-V warning is also about bad RCU usage, but that's a
> > different thing.
> >
> > > RCU used illegally from offline CPU!
> > > rcu_scheduler_active = 1, debug_locks = 1
> > > 1 lock held by swapper/1/0:
> > > #0: ffffffff8169ceb0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire+0x0/0x32
> > >
> > > stack backtrace:
> > > CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.4.0-10173-ga901a3568fd2 #1
> > > Hardware name: riscv-virtio,qemu (DT)
> > > Call Trace:
> > > [<ffffffff80006a20>] show_stack+0x2c/0x38
> > > [<ffffffff80af3ee0>] dump_stack_lvl+0x5e/0x80
> > > [<ffffffff80af3f16>] dump_stack+0x14/0x1c
> > > [<ffffffff80083ff0>] lockdep_rcu_suspicious+0x19e/0x232
> > > [<ffffffff80ad4802>] mtree_load+0x18a/0x3b6
> > > [<ffffffff80091632>] __irq_get_desc_lock+0x2c/0x82
> > > [<ffffffff80094722>] enable_percpu_irq+0x36/0x9e
> > > [<ffffffff800087d4>] riscv_ipi_enable+0x32/0x4e
> > > [<ffffffff80008692>] smp_callin+0x24/0x66
> >
> > This is also triggering on the maple tree sanity checks, but it' sa
> > different maple tree, and a different code sequence.
> >
> > And a different case of suspicious RCU usage - not a lack of locking,
> > but simply using RCU before marking the CPU online.
>
> Ah, I probably should've known from the
> RCU used illegally from offline CPU!
> that it was different.
>
> > I suspect the riscv_ipi_enable() in the RISC-V version of smp_callin()
> > needs to be moved down to below the
> >
> > set_cpu_online(curr_cpuid, 1);
> >
> > or was there some reason why it needed to be done quite _that_ early
> > in commit 832f15f42646 ("RISC-V: Treat IPIs as normal Linux IRQs")?
> >
> > Added guilty parties to the cc.
>
> Taking the rationale & potential problems out of the equation, that
> code movement does suppress the complaints from rcu/maple tree,
> thanks.
Comparing with what we do on arm64, a less radical change would be to
move the IPI init after notify_cpu_starting(), which explicitly
enables RCU usage.
Something like:
diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c
index bb0b76e1a6d4..f4d6acb38dd0 100644
--- a/arch/riscv/kernel/smpboot.c
+++ b/arch/riscv/kernel/smpboot.c
@@ -238,10 +238,11 @@ asmlinkage __visible void smp_callin(void)
mmgrab(mm);
current->active_mm = mm;
- riscv_ipi_enable();
-
store_cpu_topology(curr_cpuid);
notify_cpu_starting(curr_cpuid);
+
+ riscv_ipi_enable();
+
numa_add_cpu(curr_cpuid);
set_cpu_online(curr_cpuid, 1);
probe_vendor_features(curr_cpuid);
which I obviously haven't tested at all.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
Powered by blists - more mailing lists