linux-kernel - Re: [PATCH] cpuhp: Expedite synchronize

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aWU9HRcs4ghazIRg@linux.ibm.com>
Date: Mon, 12 Jan 2026 23:57:41 +0530
From: Vishal Chourasia <vishalc@...ux.ibm.com>
To: Uladzislau Rezki <urezki@...il.com>
Cc: "Paul E. McKenney" <paulmck@...nel.org>,
        Joel Fernandes <joelagnelf@...dia.com>,
        Shrikanth Hegde <sshegde@...ux.ibm.com>,
        "rcu@...r.kernel.org" <rcu@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "frederic@...nel.org" <frederic@...nel.org>,
        "neeraj.upadhyay@...nel.org" <neeraj.upadhyay@...nel.org>,
        "josh@...htriplett.org" <josh@...htriplett.org>,
        "boqun.feng@...il.com" <boqun.feng@...il.com>,
        "rostedt@...dmis.org" <rostedt@...dmis.org>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "peterz@...radead.org" <peterz@...radead.org>,
        "srikar@...ux.ibm.com" <srikar@...ux.ibm.com>
Subject: Re: [PATCH] cpuhp: Expedite synchronize_rcu during CPU hotplug
 operations

Hello Joel, Paul, Uladzislau,

On Mon, Jan 12, 2026 at 06:05:30PM +0100, Uladzislau Rezki wrote:
> On Mon, Jan 12, 2026 at 08:48:42AM -0800, Paul E. McKenney wrote:
> > On Mon, Jan 12, 2026 at 04:09:49PM +0000, Joel Fernandes wrote:
> > > 
> > > 
> > > > On Jan 12, 2026, at 7:57 AM, Uladzislau Rezki <urezki@...il.com> wrote:
> > > > 
> > > >> 
> > > > Sounds good to me. I agree it is better to bypass parameters.
> > > 
> > > Another way to make it in-kernel would be to make the RCU normal wake from GP optimization enabled for > 16 CPUs by default.
> > > 
> > > I was considering this, but I did not bring it up because I did not know that there are large systems that might benefit from it until now.
> > 
> > This would require increasing the scalability of this optimization,
> > right?  Or am I thinking of the wrong optimization?  ;-)
> > 
> I tested this before. I noticed that after 64K of simultaneous
> synchronize_rcu() calls the scalability is required. Everything
> less was faster with a new approach.

It is worth noting that bulk CPU hotplug represents a different stress
pattern than the "simultaneous call" scenario mentioned above.

In a large-scale hotplug event (like a SMT mode switch), we aren't
necessarily seeing thousands of simultaneous synchronize_rcu() calls.
Instead, because CPU hotplug operations are serialized, we see a
"conveyor belt" of sequential calls. One synchronize_rcu() blocks, the
hotplug state machine waits, it unblocks, and then the next call is
triggered shortly after.

The bottleneck here isn't RCU scalability under concurrent load, but
rather the accumulated latency of hundreds of sequential Grace Periods.

For example, on pSeries, onlining 350 out of 400 CPUs triggers exactly
350 calls at three different points in the hotplug state machine. Even
though they happen one at a time, the sheer volume makes the total
operation time prohibitive.

Following callstack was collected during SMT mode switch where 350 out
of 400 CPUs were onlined,

@[
    synchronize_rcu+12
    cpuidle_pause_and_lock+120
    pseries_cpuidle_cpu_online+88
    cpuhp_invoke_callback+500
    cpuhp_thread_fun+316
    smpboot_thread_fn+512
    kthread+308
    start_kernel_thread+20
]: 350
@[
    synchronize_rcu+12
    rcu_sync_enter+260
    percpu_down_write+76
    _cpu_up+140
    cpu_up+440
    cpu_subsys_online+128
    device_online+176
    online_store+220
    dev_attr_store+52
    sysfs_kf_write+120
    kernfs_fop_write_iter+456
    vfs_write+952
    ksys_write+132
    system_call_exception+292
    system_call_vectored_common+348
]: 350
@[
    synchronize_rcu+12
    rcu_sync_enter+260
    percpu_down_write+76
    try_online_node+64
    cpu_up+120
    cpu_subsys_online+128
    device_online+176
    online_store+220
    dev_attr_store+52
    sysfs_kf_write+120
    kernfs_fop_write_iter+456
    vfs_write+952
    ksys_write+132
    system_call_exception+292
    system_call_vectored_common+348
]: 350

Following callstack was collected during SMT mode switch where 350 out
of 400 CPUs where offlined,

@[
    synchronize_rcu+12
    rcu_sync_enter+260
    percpu_down_write+76
    _cpu_down+188
    __cpu_down_maps_locked+44
    work_for_cpu_fn+56
    process_one_work+508
    worker_thread+840
    kthread+308
    start_kernel_thread+20
]: 1
@[
    synchronize_rcu+12
    sched_cpu_deactivate+244
    cpuhp_invoke_callback+500
    cpuhp_thread_fun+316
    smpboot_thread_fn+512
    kthread+308
    start_kernel_thread+20
]: 350
@[
    synchronize_rcu+12
    cpuidle_pause_and_lock+120
    pseries_cpuidle_cpu_dead+88
    cpuhp_invoke_callback+500
    __cpuhp_invoke_callback_range+200
    _cpu_down+412
    __cpu_down_maps_locked+44
    work_for_cpu_fn+56
    process_one_work+508
    worker_thread+840
    kthread+308
    start_kernel_thread+20
]: 350


- vishalc