linux-kernel - Re: [PATCH] cpuhp: Expedite synchronize

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8dbba75f-329d-4e86-b61e-38be0b101b0b@paulmck-laptop>
Date: Mon, 12 Jan 2026 16:03:33 -0800
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Vishal Chourasia <vishalc@...ux.ibm.com>
Cc: Uladzislau Rezki <urezki@...il.com>,
	Joel Fernandes <joelagnelf@...dia.com>,
	Shrikanth Hegde <sshegde@...ux.ibm.com>,
	"rcu@...r.kernel.org" <rcu@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"frederic@...nel.org" <frederic@...nel.org>,
	"neeraj.upadhyay@...nel.org" <neeraj.upadhyay@...nel.org>,
	"josh@...htriplett.org" <josh@...htriplett.org>,
	"boqun.feng@...il.com" <boqun.feng@...il.com>,
	"rostedt@...dmis.org" <rostedt@...dmis.org>,
	"tglx@...utronix.de" <tglx@...utronix.de>,
	"peterz@...radead.org" <peterz@...radead.org>,
	"srikar@...ux.ibm.com" <srikar@...ux.ibm.com>
Subject: Re: [PATCH] cpuhp: Expedite synchronize_rcu during CPU hotplug
 operations

On Mon, Jan 12, 2026 at 11:57:41PM +0530, Vishal Chourasia wrote:
> Hello Joel, Paul, Uladzislau,
> 
> On Mon, Jan 12, 2026 at 06:05:30PM +0100, Uladzislau Rezki wrote:
> > On Mon, Jan 12, 2026 at 08:48:42AM -0800, Paul E. McKenney wrote:
> > > On Mon, Jan 12, 2026 at 04:09:49PM +0000, Joel Fernandes wrote:
> > > > 
> > > > 
> > > > > On Jan 12, 2026, at 7:57 AM, Uladzislau Rezki <urezki@...il.com> wrote:
> > > > > 
> > > > >> 
> > > > > Sounds good to me. I agree it is better to bypass parameters.
> > > > 
> > > > Another way to make it in-kernel would be to make the RCU normal wake from GP optimization enabled for > 16 CPUs by default.
> > > > 
> > > > I was considering this, but I did not bring it up because I did not know that there are large systems that might benefit from it until now.
> > > 
> > > This would require increasing the scalability of this optimization,
> > > right?  Or am I thinking of the wrong optimization?  ;-)
> > > 
> > I tested this before. I noticed that after 64K of simultaneous
> > synchronize_rcu() calls the scalability is required. Everything
> > less was faster with a new approach.
> 
> It is worth noting that bulk CPU hotplug represents a different stress
> pattern than the "simultaneous call" scenario mentioned above.
> 
> In a large-scale hotplug event (like a SMT mode switch), we aren't
> necessarily seeing thousands of simultaneous synchronize_rcu() calls.
> Instead, because CPU hotplug operations are serialized, we see a
> "conveyor belt" of sequential calls. One synchronize_rcu() blocks, the
> hotplug state machine waits, it unblocks, and then the next call is
> triggered shortly after.
> 
> The bottleneck here isn't RCU scalability under concurrent load, but
> rather the accumulated latency of hundreds of sequential Grace Periods.
> 
> For example, on pSeries, onlining 350 out of 400 CPUs triggers exactly
> 350 calls at three different points in the hotplug state machine. Even
> though they happen one at a time, the sheer volume makes the total
> operation time prohibitive.
> 
> Following callstack was collected during SMT mode switch where 350 out
> of 400 CPUs were onlined,
> 
> @[
>     synchronize_rcu+12
>     cpuidle_pause_and_lock+120
>     pseries_cpuidle_cpu_online+88
>     cpuhp_invoke_callback+500
>     cpuhp_thread_fun+316
>     smpboot_thread_fn+512
>     kthread+308
>     start_kernel_thread+20
> ]: 350
> @[
>     synchronize_rcu+12
>     rcu_sync_enter+260
>     percpu_down_write+76
>     _cpu_up+140
>     cpu_up+440
>     cpu_subsys_online+128
>     device_online+176
>     online_store+220
>     dev_attr_store+52
>     sysfs_kf_write+120
>     kernfs_fop_write_iter+456
>     vfs_write+952
>     ksys_write+132
>     system_call_exception+292
>     system_call_vectored_common+348
> ]: 350
> @[
>     synchronize_rcu+12
>     rcu_sync_enter+260
>     percpu_down_write+76
>     try_online_node+64
>     cpu_up+120
>     cpu_subsys_online+128
>     device_online+176
>     online_store+220
>     dev_attr_store+52
>     sysfs_kf_write+120
>     kernfs_fop_write_iter+456
>     vfs_write+952
>     ksys_write+132
>     system_call_exception+292
>     system_call_vectored_common+348
> ]: 350
> 
> Following callstack was collected during SMT mode switch where 350 out
> of 400 CPUs where offlined,
> 
> @[
>     synchronize_rcu+12
>     rcu_sync_enter+260
>     percpu_down_write+76
>     _cpu_down+188
>     __cpu_down_maps_locked+44
>     work_for_cpu_fn+56
>     process_one_work+508
>     worker_thread+840
>     kthread+308
>     start_kernel_thread+20
> ]: 1
> @[
>     synchronize_rcu+12
>     sched_cpu_deactivate+244
>     cpuhp_invoke_callback+500
>     cpuhp_thread_fun+316
>     smpboot_thread_fn+512
>     kthread+308
>     start_kernel_thread+20
> ]: 350
> @[
>     synchronize_rcu+12
>     cpuidle_pause_and_lock+120
>     pseries_cpuidle_cpu_dead+88
>     cpuhp_invoke_callback+500
>     __cpuhp_invoke_callback_range+200
>     _cpu_down+412
>     __cpu_down_maps_locked+44
>     work_for_cpu_fn+56
>     process_one_work+508
>     worker_thread+840
>     kthread+308
>     start_kernel_thread+20
> ]: 350

I still suggest that you test on a big system.  There are other sources
of synchronize_rcu() calls than just CPU hotplug.  ;-)

							Thanx, Paul