[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20260119051835.GA696111@joelbox2>
Date: Mon, 19 Jan 2026 00:18:35 -0500
From: Joel Fernandes <joelagnelf@...dia.com>
To: Samir M <samir@...ux.ibm.com>, Vishal Chourasia <vishalc@...ux.ibm.com>,
rcu@...r.kernel.org, linux-kernel@...r.kernel.org
Cc: paulmck@...nel.org, frederic@...nel.org, neeraj.upadhyay@...nel.org,
josh@...htriplett.org, boqun.feng@...il.com, urezki@...il.com,
rostedt@...dmis.org, tglx@...utronix.de, peterz@...radead.org,
sshegde@...ux.ibm.com, srikar@...ux.ibm.com
Subject: Re: [PATCH] cpuhp: Expedite synchronize_rcu during CPU hotplug
operations
On Sun, Jan 18, 2026 at 05:08:44PM +0530, Samir M wrote:
> On 12/01/26 3:13 pm, Vishal Chourasia wrote:
> > Bulk CPU hotplug operations--such as switching SMT modes across all
> > cores--require hotplugging multiple CPUs in rapid succession. On large
> > systems, this process takes significant time, increasing as the number
> > of CPUs grows, leading to substantial delays on high-core-count
> > machines. Analysis [1] reveals that the majority of this time is spent
> > waiting for synchronize_rcu().
> >
> > Expedite synchronize_rcu() during the hotplug path to accelerate the
> > operation. Since CPU hotplug is a user-initiated administrative task,
> > it should complete as quickly as possible.
>
> Hi Vishal,
>
> I verified this patch using the configuration described below.
> Configuration:
> * Kernel version: 6.19.0-rc5
> * Number of CPUs: 2048
>
> SMT Mode | Without Patch | With Patch | % Improvement |
> ------------------------------------------------------------------
> SMT=off | 30m 53.194s | 6m 4.250s | +80.40% |
> SMT=on | 49m 5.920s | 36m 50.386s | +25.01% |
Hi Vishal, Samir,
Thanks for the testing on your large CPU count system.
Considering the SMT=on performance is still terrible, before we expedite
RCU, could we try the approach Peter suggested (avoiding repeated
lock/unlock)? I wrote a patch below.
git://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git
tag: cpuhp-bulk-optimize-rfc-v1
I tested it lightly on rcutorture hotplug test and it passes. Please share
any performance results, thanks.
Also I'd like to use expediting of RCU as a last resort TBH, we should
optimize the outer operations that require RCU in the first place such as
Peter's suggestion since that will improve the overall efficiency of the
code. And if/when expediting RCU, Peter's other suggestion to not do it in
cpus_write_lock() and instead do it from cpuhp_smt_enable() also makes sense
to me.
---8<-----------------------
From: Joel Fernandes <joelagnelf@...dia.com>
Subject: [PATCH] cpuhp: Optimize batch SMT enable by reducing lock acquiring
Bulk CPU hotplug operations such as enabling SMT across all cores
require hotplugging multiple CPUs. The current implementation takes
cpus_write_lock() for each individual CPU causing multiple slow grace
period requests.
Therefore introduce cpu_up_locked() that assumes the caller already
holds cpus_write_lock(). The cpuhp_smt_enable() function is updated to
hold the lock once around the entire loop rather than for each CPU.
Link: https://lore.kernel.org/all/20260113090153.GS830755@noisy.programming.kicks-ass.net/
Suggested-by: Peter Zijlstra <peterz@...radead.org>
Signed-off-by: Joel Fernandes <joelagnelf@...dia.com>
---
kernel/cpu.c | 40 +++++++++++++++++++++++++---------------
1 file changed, 25 insertions(+), 15 deletions(-)
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 8df2d773fe3b..4ce7deb236d7 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1623,34 +1623,31 @@ void cpuhp_online_idle(enum cpuhp_state state)
complete_ap_thread(st, true);
}
-/* Requires cpu_add_remove_lock to be held */
-static int _cpu_up(unsigned int cpu, int tasks_frozen, enum cpuhp_state target)
+/* Requires cpu_add_remove_lock and cpus_write_lock to be held. */
+static int cpu_up_locked(unsigned int cpu, int tasks_frozen,
+ enum cpuhp_state target)
{
struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
struct task_struct *idle;
int ret = 0;
- cpus_write_lock();
+ lockdep_assert_cpus_held();
- if (!cpu_present(cpu)) {
- ret = -EINVAL;
- goto out;
- }
+ if (!cpu_present(cpu))
+ return -EINVAL;
/*
* The caller of cpu_up() might have raced with another
* caller. Nothing to do.
*/
if (st->state >= target)
- goto out;
+ return 0;
if (st->state == CPUHP_OFFLINE) {
/* Let it fail before we try to bring the cpu up */
idle = idle_thread_get(cpu);
- if (IS_ERR(idle)) {
- ret = PTR_ERR(idle);
- goto out;
- }
+ if (IS_ERR(idle))
+ return PTR_ERR(idle);
/*
* Reset stale stack state from the last time this CPU was online.
@@ -1673,7 +1670,7 @@ static int _cpu_up(unsigned int cpu, int tasks_frozen, enum cpuhp_state target)
* return the error code..
*/
if (ret)
- goto out;
+ return ret;
}
/*
@@ -1683,7 +1680,16 @@ static int _cpu_up(unsigned int cpu, int tasks_frozen, enum cpuhp_state target)
*/
target = min((int)target, CPUHP_BRINGUP_CPU);
ret = cpuhp_up_callbacks(cpu, st, target);
-out:
+ return ret;
+}
+
+/* Requires cpu_add_remove_lock to be held */
+static int _cpu_up(unsigned int cpu, int tasks_frozen, enum cpuhp_state target)
+{
+ int ret;
+
+ cpus_write_lock();
+ ret = cpu_up_locked(cpu, tasks_frozen, target);
cpus_write_unlock();
arch_smt_update();
return ret;
@@ -2715,6 +2721,8 @@ int cpuhp_smt_enable(void)
int cpu, ret = 0;
cpu_maps_update_begin();
+ /* Hold cpus_write_lock() for entire batch operation. */
+ cpus_write_lock();
cpu_smt_control = CPU_SMT_ENABLED;
for_each_present_cpu(cpu) {
/* Skip online CPUs and CPUs on offline nodes */
@@ -2722,12 +2730,14 @@ int cpuhp_smt_enable(void)
continue;
if (!cpu_smt_thread_allowed(cpu) || !topology_is_core_online(cpu))
continue;
- ret = _cpu_up(cpu, 0, CPUHP_ONLINE);
+ ret = cpu_up_locked(cpu, 0, CPUHP_ONLINE);
if (ret)
break;
/* See comment in cpuhp_smt_disable() */
cpuhp_online_cpu_device(cpu);
}
+ cpus_write_unlock();
+ arch_smt_update();
cpu_maps_update_done();
return ret;
}
--
2.34.1
Powered by blists - more mailing lists