[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <0299355F-E474-4892-AFC1-11C63B6F4FCC@nvidia.com>
Date: Mon, 19 Jan 2026 16:10:48 -0500
From: <joelagnelf@...dia.com>
To: Shrikanth Hegde <sshegde@...ux.ibm.com>
Cc: <paulmck@...nel.org>, <frederic@...nel.org>, <neeraj.upadhyay@...nel.org>, <josh@...htriplett.org>, <boqun.feng@...il.com>, <urezki@...il.com>, <rostedt@...dmis.org>, <tglx@...utronix.de>, <peterz@...radead.org>, <srikar@...ux.ibm.com>, Samir M <samir@...ux.ibm.com>, Vishal Chourasia <vishalc@...ux.ibm.com>, <rcu@...r.kernel.org>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] cpuhp: Expedite synchronize_rcu during CPU hotplug operations
> On Jan 19, 2026, at 8:54 AM, Shrikanth Hegde wrote:
>
>
>
>> On 1/19/26 10:48 AM, Joel Fernandes wrote:
>>> On Sun, Jan 18, 2026 at 05:08:44PM +0530, Samir M wrote:
>>>> On 12/01/26 3:13 pm, Vishal Chourasia wrote:
>>> > Bulk CPU hotplug operations--such as switching SMT modes across all
>>> > cores--require hotplugging multiple CPUs in rapid succession. On large
>>> > systems, this process takes significant time, increasing as the number
>>> > of CPUs grows, leading to substantial delays on high-core-count
>>> > machines. Analysis [1] reveals that the majority of this time is spent
>>> > waiting for synchronize_rcu().
>>> >
>>> > Expedite synchronize_rcu() during the hotplug path to accelerate the
>>> > operation. Since CPU hotplug is a user-initiated administrative task,
>>> > it should complete as quickly as possible.
>>>
>>> Hi Vishal,
>>>
>>> I verified this patch using the configuration described below.
>>> Configuration:
>>> * Kernel version: 6.19.0-rc5
>>> * Number of CPUs: 2048
>>>
>>> SMT Mode | Without Patch | With Patch | % Improvement |
>>> ------------------------------------------------------------------
>>> SMT=off | 30m 53.194s | 6m 4.250s | +80.40% |
>>> SMT=on | 49m 5.920s | 36m 50.386s | +25.01% |
>> Hi Vishal, Samir,
>> Thanks for the testing on your large CPU count system.
>> Considering the SMT=on performance is still terrible, before we expedite RCU, could we try the approach Peter suggested (avoiding repeated
>> lock/unlock)? I wrote a patch below.
>> git://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git
>> tag: cpuhp-bulk-optimize-rfc-v1
>> I tested it lightly on rcutorture hotplug test and it passes. Please share
>> any performance results, thanks.
>> Also I'd like to use expediting of RCU as a last resort TBH, we should
>> optimize the outer operations that require RCU in the first place such as
>> Peter's suggestion since that will improve the overall efficiency of the
>> code. And if/when expediting RCU, Peter's other suggestion to not do it in
>> cpus_write_lock() and instead do it from cpuhp_smt_enable() also makes sense
>> to me.
>> ---8<-----------------------
>> From: Joel Fernandes
>> Subject: [PATCH] cpuhp: Optimize batch SMT enable by reducing lock acquiring
>> Bulk CPU hotplug operations such as enabling SMT across all cores
>> require hotplugging multiple CPUs. The current implementation takes
>> cpus_write_lock() for each individual CPU causing multiple slow grace
>> period requests.
>> Therefore introduce cpu_up_locked() that assumes the caller already
>> holds cpus_write_lock(). The cpuhp_smt_enable() function is updated to
>> hold the lock once around the entire loop rather than for each CPU.
>> Link: https://lore.kernel.org/ all/20260113090153.GS830755@...sy.programming.kicks-ass.net/
>> Suggested-by: Peter Zijlstra
>> Signed-off-by: Joel Fernandes
>> ---
>> kernel/cpu.c | 40 +++++++++++++++++++++++++---------------
>> 1 file changed, 25 insertions(+), 15 deletions(-)
>> diff --git a/kernel/cpu.c b/kernel/cpu.c
>> index 8df2d773fe3b..4ce7deb236d7 100644
>> --- a/kernel/cpu.c
>> +++ b/kernel/cpu.c
>> @@ -1623,34 +1623,31 @@ void cpuhp_online_idle(enum cpuhp_state state)
>> complete_ap_thread(st, true);
>> }
>> -/* Requires cpu_add_remove_lock to be held */
>> -static int _cpu_up(unsigned int cpu, int tasks_frozen, enum cpuhp_state target)
>> +/* Requires cpu_add_remove_lock and cpus_write_lock to be held. */
>> +static int cpu_up_locked(unsigned int cpu, int tasks_frozen,
>> + enum cpuhp_state target)
>> {
>> struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
>> struct task_struct *idle;
>> int ret = 0;
>> - cpus_write_lock();
>> + lockdep_assert_cpus_held();
>> - if (!cpu_present(cpu)) {
>> - ret = -EINVAL;
>> - goto out;
>> - }
>> + if (!cpu_present(cpu))
>> + return -EINVAL;
>> /*
>> * The caller of cpu_up() might have raced with another
>> * caller. Nothing to do.
>> */
>> if (st->state >= target)
>> - goto out;
>> + return 0;
>> if (st->state == CPUHP_OFFLINE) {
>> /* Let it fail before we try to bring the cpu up */
>> idle = idle_thread_get(cpu);
>> - if (IS_ERR(idle)) {
>> - ret = PTR_ERR(idle);
>> - goto out;
>> - }
>> + if (IS_ERR(idle))
>> + return PTR_ERR(idle);
>> /*
>> * Reset stale stack state from the last time this CPU was online.
>> @@ -1673,7 +1670,7 @@ static int _cpu_up(unsigned int cpu, int tasks_frozen, enum cpuhp_state target)
>> * return the error code..
>> */
>> if (ret)
>> - goto out;
>> + return ret;
>> }
>> /*
>> @@ -1683,7 +1680,16 @@ static int _cpu_up(unsigned int cpu, int tasks_frozen, enum cpuhp_state target)
>> */
>> target = min((int)target, CPUHP_BRINGUP_CPU);
>> ret = cpuhp_up_callbacks(cpu, st, target);
>> -out:
>> + return ret;
>> +}
>> +
>> +/* Requires cpu_add_remove_lock to be held */
>> +static int _cpu_up(unsigned int cpu, int tasks_frozen, enum cpuhp_state target)
>> +{
>> + int ret;
>> +
>> + cpus_write_lock();
>> + ret = cpu_up_locked(cpu, tasks_frozen, target);
>> cpus_write_unlock();
>> arch_smt_update();
>> return ret;
>> @@ -2715,6 +2721,8 @@ int cpuhp_smt_enable(void)
>> int cpu, ret = 0;
>> cpu_maps_update_begin();
>> + /* Hold cpus_write_lock() for entire batch operation. */
>> + cpus_write_lock();
>> cpu_smt_control = CPU_SMT_ENABLED;
>> for_each_present_cpu(cpu) {
>> /* Skip online CPUs and CPUs on offline nodes */
>> @@ -2722,12 +2730,14 @@ int cpuhp_smt_enable(void)
>> continue;
>> if (!cpu_smt_thread_allowed(cpu) || ! topology_is_core_online(cpu))
>> continue;
>> - ret = _cpu_up(cpu, 0, CPUHP_ONLINE);
>> + ret = cpu_up_locked(cpu, 0, CPUHP_ONLINE);
>> if (ret)
>> break;
>> /* See comment in cpuhp_smt_disable() */
>> cpuhp_online_cpu_device(cpu);
>> }
>> + cpus_write_unlock();
>> + arch_smt_update();
>> cpu_maps_update_done();
>> return ret;
>> }
>
> What about cpuhp_smt_disable?
Doing this is not that easy on disable path, AFAICS. Considering that the enable
path in the performance tests was much worse, I wanted to contain it to that.
This does not have to be the only fix though, but one of the cures to get there.
Thanks.
Powered by blists - more mailing lists