[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8dbba75f-329d-4e86-b61e-38be0b101b0b@paulmck-laptop>
Date: Mon, 12 Jan 2026 16:03:33 -0800
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Vishal Chourasia <vishalc@...ux.ibm.com>
Cc: Uladzislau Rezki <urezki@...il.com>,
Joel Fernandes <joelagnelf@...dia.com>,
Shrikanth Hegde <sshegde@...ux.ibm.com>,
"rcu@...r.kernel.org" <rcu@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"frederic@...nel.org" <frederic@...nel.org>,
"neeraj.upadhyay@...nel.org" <neeraj.upadhyay@...nel.org>,
"josh@...htriplett.org" <josh@...htriplett.org>,
"boqun.feng@...il.com" <boqun.feng@...il.com>,
"rostedt@...dmis.org" <rostedt@...dmis.org>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"peterz@...radead.org" <peterz@...radead.org>,
"srikar@...ux.ibm.com" <srikar@...ux.ibm.com>
Subject: Re: [PATCH] cpuhp: Expedite synchronize_rcu during CPU hotplug
operations
On Mon, Jan 12, 2026 at 11:57:41PM +0530, Vishal Chourasia wrote:
> Hello Joel, Paul, Uladzislau,
>
> On Mon, Jan 12, 2026 at 06:05:30PM +0100, Uladzislau Rezki wrote:
> > On Mon, Jan 12, 2026 at 08:48:42AM -0800, Paul E. McKenney wrote:
> > > On Mon, Jan 12, 2026 at 04:09:49PM +0000, Joel Fernandes wrote:
> > > >
> > > >
> > > > > On Jan 12, 2026, at 7:57 AM, Uladzislau Rezki <urezki@...il.com> wrote:
> > > > >
> > > > >>
> > > > > Sounds good to me. I agree it is better to bypass parameters.
> > > >
> > > > Another way to make it in-kernel would be to make the RCU normal wake from GP optimization enabled for > 16 CPUs by default.
> > > >
> > > > I was considering this, but I did not bring it up because I did not know that there are large systems that might benefit from it until now.
> > >
> > > This would require increasing the scalability of this optimization,
> > > right? Or am I thinking of the wrong optimization? ;-)
> > >
> > I tested this before. I noticed that after 64K of simultaneous
> > synchronize_rcu() calls the scalability is required. Everything
> > less was faster with a new approach.
>
> It is worth noting that bulk CPU hotplug represents a different stress
> pattern than the "simultaneous call" scenario mentioned above.
>
> In a large-scale hotplug event (like a SMT mode switch), we aren't
> necessarily seeing thousands of simultaneous synchronize_rcu() calls.
> Instead, because CPU hotplug operations are serialized, we see a
> "conveyor belt" of sequential calls. One synchronize_rcu() blocks, the
> hotplug state machine waits, it unblocks, and then the next call is
> triggered shortly after.
>
> The bottleneck here isn't RCU scalability under concurrent load, but
> rather the accumulated latency of hundreds of sequential Grace Periods.
>
> For example, on pSeries, onlining 350 out of 400 CPUs triggers exactly
> 350 calls at three different points in the hotplug state machine. Even
> though they happen one at a time, the sheer volume makes the total
> operation time prohibitive.
>
> Following callstack was collected during SMT mode switch where 350 out
> of 400 CPUs were onlined,
>
> @[
> synchronize_rcu+12
> cpuidle_pause_and_lock+120
> pseries_cpuidle_cpu_online+88
> cpuhp_invoke_callback+500
> cpuhp_thread_fun+316
> smpboot_thread_fn+512
> kthread+308
> start_kernel_thread+20
> ]: 350
> @[
> synchronize_rcu+12
> rcu_sync_enter+260
> percpu_down_write+76
> _cpu_up+140
> cpu_up+440
> cpu_subsys_online+128
> device_online+176
> online_store+220
> dev_attr_store+52
> sysfs_kf_write+120
> kernfs_fop_write_iter+456
> vfs_write+952
> ksys_write+132
> system_call_exception+292
> system_call_vectored_common+348
> ]: 350
> @[
> synchronize_rcu+12
> rcu_sync_enter+260
> percpu_down_write+76
> try_online_node+64
> cpu_up+120
> cpu_subsys_online+128
> device_online+176
> online_store+220
> dev_attr_store+52
> sysfs_kf_write+120
> kernfs_fop_write_iter+456
> vfs_write+952
> ksys_write+132
> system_call_exception+292
> system_call_vectored_common+348
> ]: 350
>
> Following callstack was collected during SMT mode switch where 350 out
> of 400 CPUs where offlined,
>
> @[
> synchronize_rcu+12
> rcu_sync_enter+260
> percpu_down_write+76
> _cpu_down+188
> __cpu_down_maps_locked+44
> work_for_cpu_fn+56
> process_one_work+508
> worker_thread+840
> kthread+308
> start_kernel_thread+20
> ]: 1
> @[
> synchronize_rcu+12
> sched_cpu_deactivate+244
> cpuhp_invoke_callback+500
> cpuhp_thread_fun+316
> smpboot_thread_fn+512
> kthread+308
> start_kernel_thread+20
> ]: 350
> @[
> synchronize_rcu+12
> cpuidle_pause_and_lock+120
> pseries_cpuidle_cpu_dead+88
> cpuhp_invoke_callback+500
> __cpuhp_invoke_callback_range+200
> _cpu_down+412
> __cpu_down_maps_locked+44
> work_for_cpu_fn+56
> process_one_work+508
> worker_thread+840
> kthread+308
> start_kernel_thread+20
> ]: 350
I still suggest that you test on a big system. There are other sources
of synchronize_rcu() calls than just CPU hotplug. ;-)
Thanx, Paul
Powered by blists - more mailing lists