[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ad2a55f9-9bca-41bf-a6ec-efb23eaa778f@paulmck-laptop>
Date: Sat, 22 Nov 2025 10:47:17 -0800
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Marek Szyprowski <m.szyprowski@...sung.com>
Cc: Thomas Gleixner <tglx@...utronix.de>,
LKML <linux-kernel@...r.kernel.org>,
Peter Zijlstra <peterz@...radead.org>,
Gabriele Monaco <gmonaco@...hat.com>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Michael Jeanson <mjeanson@...icios.com>,
Jens Axboe <axboe@...nel.dk>,
"Gautham R. Shenoy" <gautham.shenoy@....com>,
Florian Weimer <fweimer@...hat.com>,
Tim Chen <tim.c.chen@...el.com>, Yury Norov <yury.norov@...il.com>,
Shrikanth Hegde <sshegde@...ux.ibm.com>
Subject: Re: [patch V5 09/20] cpumask: Cache num_possible_cpus()
On Fri, Nov 21, 2025 at 11:56:44PM +0100, Marek Szyprowski wrote:
> Hi
>
> On 19.11.2025 18:27, Thomas Gleixner wrote:
> > Reevaluating num_possible_cpus() over and over does not make sense. That
> > becomes a constant after init as cpu_possible_mask is marked ro_after_init.
> >
> > Cache the value during initialization and provide that for consumption.
> >
> > Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
> > Reviewed-by: Yury Norov <yury.norov@...il.com>
> > Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
> > Reviewed-by: Shrikanth Hegde <sshegde@...ux.ibm.com>
>
> This patch landed recently in linux-next as commit d0f23ccf6ba9
> ("cpumask: Cache num_possible_cpus()"). I found that it triggers the
> following warning during boot on some of my test systems (namely
> Raspberry Pi 3 and 4):
I also bisected to this commit on an ARM Neoverse V2. It happens twice
at about the same time, so the splats are interleaved, but they start
with an almost-NULL pointer dereference during either TASKS01 or TRACE02
callback processing. Things go surprisingly normally after that, at
least until a hang on shutdown. This sometimes happens quite late, as in
*minutes* after boot.
I do not see this on x86, nor on any rcutorture scenarios other
than TASKS01 and TRACE02. These are unusual in that they are the
only variants of the common-code tasks-RCU flavors that build with
CONFIG_PROVE_LOCKING=y.
My reproducer is:
tools/testing/selftests/rcutorture/bin/torture.sh --duration 10 --do-none --do-rcutorture --configs-rcutorture "TRACE01 TRACE02"
My current guess is that the snapshot is taken too early, though I would
be more confident of that if it happened on TREE01, in which CPUs come
online quite late.
Thanx, Paul
> kvm [1]: nv: 568 coarse grained trap handlers
> kvm [1]: IPA Size Limit: 40 bits
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 1 at arch/arm64/kvm/vmid.c:183
> kvm_arm_vmid_alloc_init+0x98/0xac
> Modules linked in:
> CPU: 1 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.18.0-rc4+ #16195 PREEMPT
> Hardware name: Raspberry Pi 3 Model B (DT)
> pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : kvm_arm_vmid_alloc_init+0x98/0xac
> lr : kvm_arm_vmid_alloc_init+0x1c/0xac
> ...
> Call trace:
> kvm_arm_vmid_alloc_init+0x98/0xac (P)
> kvm_arm_init+0x144/0x153c
> do_one_initcall+0x64/0x308
> kernel_init_freeable+0x280/0x4fc
> kernel_init+0x20/0x1d8
> ret_from_fork+0x10/0x20
> irq event stamp: 65606
> hardirqs last enabled at (65605): [<ffff8000801534e4>]
> __up_console_sem+0x6c/0x80
> hardirqs last disabled at (65606): [<ffff800081358f7c>] el1_brk64+0x20/0x60
> softirqs last enabled at (65262): [<ffff8000800c4b1c>]
> handle_softirqs+0x4c4/0x4dc
> softirqs last disabled at (65257): [<ffff8000800106a0>]
> __do_softirq+0x14/0x20
> ---[ end trace 0000000000000000 ]---
> kvm [1]: Hyp nVHE mode initialized successfully
>
>
> Reverting it on top of linux-next fixes the issue. Let me know how can I
> help debugging it.
>
>
> > ---
> > V4: Add comment why this is not marked __init ....
> > V2: New patch
> > ---
> > include/linux/cpumask.h | 10 ++++++++--
> > kernel/cpu.c | 19 +++++++++++++++++++
> > 2 files changed, 27 insertions(+), 2 deletions(-)
> > --- a/include/linux/cpumask.h
> > +++ b/include/linux/cpumask.h
> > @@ -126,6 +126,7 @@ extern struct cpumask __cpu_dying_mask;
> > #define cpu_dying_mask ((const struct cpumask *)&__cpu_dying_mask)
> >
> > extern atomic_t __num_online_cpus;
> > +extern unsigned int __num_possible_cpus;
> >
> > extern cpumask_t cpus_booted_once_mask;
> >
> > @@ -1152,13 +1153,13 @@ void init_cpu_possible(const struct cpum
> > #define __assign_cpu(cpu, mask, val) \
> > __assign_bit(cpumask_check(cpu), cpumask_bits(mask), (val))
> >
> > -#define set_cpu_possible(cpu, possible) assign_cpu((cpu), &__cpu_possible_mask, (possible))
> > #define set_cpu_enabled(cpu, enabled) assign_cpu((cpu), &__cpu_enabled_mask, (enabled))
> > #define set_cpu_present(cpu, present) assign_cpu((cpu), &__cpu_present_mask, (present))
> > #define set_cpu_active(cpu, active) assign_cpu((cpu), &__cpu_active_mask, (active))
> > #define set_cpu_dying(cpu, dying) assign_cpu((cpu), &__cpu_dying_mask, (dying))
> >
> > void set_cpu_online(unsigned int cpu, bool online);
> > +void set_cpu_possible(unsigned int cpu, bool possible);
> >
> > /**
> > * to_cpumask - convert a NR_CPUS bitmap to a struct cpumask *
> > @@ -1211,7 +1212,12 @@ static __always_inline unsigned int num_
> > {
> > return raw_atomic_read(&__num_online_cpus);
> > }
> > -#define num_possible_cpus() cpumask_weight(cpu_possible_mask)
> > +
> > +static __always_inline unsigned int num_possible_cpus(void)
> > +{
> > + return __num_possible_cpus;
> > +}
> > +
> > #define num_enabled_cpus() cpumask_weight(cpu_enabled_mask)
> > #define num_present_cpus() cpumask_weight(cpu_present_mask)
> > #define num_active_cpus() cpumask_weight(cpu_active_mask)
> > --- a/kernel/cpu.c
> > +++ b/kernel/cpu.c
> > @@ -3108,6 +3108,9 @@ EXPORT_SYMBOL(__cpu_dying_mask);
> > atomic_t __num_online_cpus __read_mostly;
> > EXPORT_SYMBOL(__num_online_cpus);
> >
> > +unsigned int __num_possible_cpus __ro_after_init = NR_CPUS;
> > +EXPORT_SYMBOL(__num_possible_cpus);
> > +
> > void init_cpu_present(const struct cpumask *src)
> > {
> > cpumask_copy(&__cpu_present_mask, src);
> > @@ -3116,6 +3119,7 @@ void init_cpu_present(const struct cpuma
> > void init_cpu_possible(const struct cpumask *src)
> > {
> > cpumask_copy(&__cpu_possible_mask, src);
> > + __num_possible_cpus = cpumask_weight(&__cpu_possible_mask);
> > }
> >
> > void set_cpu_online(unsigned int cpu, bool online)
> > @@ -3139,6 +3143,21 @@ void set_cpu_online(unsigned int cpu, bo
> > }
> > }
> >
> > +/*
> > + * This should be marked __init, but there is a boatload of call sites
> > + * which need to be fixed up to do so. Sigh...
> > + */
> > +void set_cpu_possible(unsigned int cpu, bool possible)
> > +{
> > + if (possible) {
> > + if (!cpumask_test_and_set_cpu(cpu, &__cpu_possible_mask))
> > + __num_possible_cpus++;
> > + } else {
> > + if (cpumask_test_and_clear_cpu(cpu, &__cpu_possible_mask))
> > + __num_possible_cpus--;
> > + }
> > +}
> > +
> > /*
> > * Activate the first processor.
> > */
> >
> >
> >
> Best regards
> --
> Marek Szyprowski, PhD
> Samsung R&D Institute Poland
>
Powered by blists - more mailing lists