linux-kernel - Re: [PATCH] arm64/smp: Move rcu_cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20201105222242.GA8842@willie-the-truck>
Date:   Thu, 5 Nov 2020 22:22:42 +0000
From:   Will Deacon <will@...nel.org>
To:     "Paul E. McKenney" <paulmck@...nel.org>, Qian Cai <cai@...hat.com>
Cc:     catalin.marinas@....com, kernel-team@...roid.com,
        Peter Zijlstra <peterz@...radead.org>,
        linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org
Subject: Re: [PATCH] arm64/smp: Move rcu_cpu_starting() earlier

On Fri, Oct 30, 2020 at 04:33:25PM +0000, Will Deacon wrote:
> On Wed, 28 Oct 2020 14:26:14 -0400, Qian Cai wrote:
> > The call to rcu_cpu_starting() in secondary_start_kernel() is not early
> > enough in the CPU-hotplug onlining process, which results in lockdep
> > splats as follows:
> > 
> >  WARNING: suspicious RCU usage
> >  -----------------------------
> >  kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!!
> > 
> > [...]
> 
> Applied to arm64 (for-next/fixes), thanks!
> 
> [1/1] arm64/smp: Move rcu_cpu_starting() earlier
>       https://git.kernel.org/arm64/c/ce3d31ad3cac

Hmm, this patch has caused a regression in the case that we fail to
online a CPU because it has incompatible CPU features and so we park it
in cpu_die_early(). We now get an endless spew of RCU stalls because the
core will never come online, but is being tracked by RCU. So I'm tempted
to revert this and live with the lockdep warning while we figure out a
proper fix.

What's the correct say to undo rcu_cpu_starting(), given that we cannot
invoke the full hotplug machinery here? Is it correct to call
rcutree_dying_cpu() on the bad CPU and then rcutree_dead_cpu() from the
CPU doing cpu_up(), or should we do something else?

Will