linux-kernel - Re: [PATCH] cpuhp: Expedite synchronize

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0299355F-E474-4892-AFC1-11C63B6F4FCC@nvidia.com>
Date: Mon, 19 Jan 2026 16:10:48 -0500
From: <joelagnelf@...dia.com>
To: Shrikanth Hegde <sshegde@...ux.ibm.com>
Cc: <paulmck@...nel.org>, <frederic@...nel.org>, <neeraj.upadhyay@...nel.org>, <josh@...htriplett.org>, <boqun.feng@...il.com>, <urezki@...il.com>, <rostedt@...dmis.org>, <tglx@...utronix.de>, <peterz@...radead.org>, <srikar@...ux.ibm.com>, Samir M <samir@...ux.ibm.com>, Vishal Chourasia <vishalc@...ux.ibm.com>, <rcu@...r.kernel.org>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] cpuhp: Expedite synchronize_rcu during CPU hotplug operations



> On Jan 19, 2026, at 8:54 AM, Shrikanth Hegde  wrote:
> 
> 
> 
>> On 1/19/26 10:48 AM, Joel Fernandes wrote:
>>> On Sun, Jan 18, 2026 at 05:08:44PM +0530, Samir M wrote:
>>>> On 12/01/26 3:13 pm, Vishal Chourasia wrote:
>>> > Bulk CPU hotplug operations--such as switching SMT modes across all
>>> > cores--require hotplugging multiple CPUs in rapid succession. On large
>>> > systems, this process takes significant time, increasing as the number
>>> > of CPUs grows, leading to substantial delays on high-core-count
>>> > machines. Analysis [1] reveals that the majority of this time is spent
>>> > waiting for synchronize_rcu().
>>> >
>>> > Expedite synchronize_rcu() during the hotplug path to accelerate the
>>> > operation. Since CPU hotplug is a user-initiated administrative task,
>>> > it should complete as quickly as possible.
>>> 
>>> Hi Vishal,
>>> 
>>> I verified this patch using the configuration described below.
>>> Configuration:
>>>      *    Kernel version: 6.19.0-rc5
>>>      *    Number of CPUs: 2048
>>> 
>>> SMT Mode    | Without Patch    | With Patch | % Improvement    |
>>> ------------------------------------------------------------------
>>> SMT=off     | 30m 53.194s      |  6m 4.250s  | +80.40%          |
>>> SMT=on      | 49m 5.920s       | 36m 50.386s | +25.01%          |
>> Hi Vishal, Samir,
>> Thanks for the testing on your large CPU count system.
>> Considering the SMT=on performance is still terrible, before we expedite RCU, could we try the approach Peter suggested (avoiding repeated
>> lock/unlock)? I wrote a patch below.
>>   git://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git
>>   tag: cpuhp-bulk-optimize-rfc-v1
>> I tested it lightly on rcutorture hotplug test and it passes. Please share
>> any performance results, thanks.
>> Also I'd like to use expediting of RCU as a last resort TBH, we should
>> optimize the outer operations that require RCU in the first place such as
>> Peter's suggestion since that will improve the overall efficiency of the
>> code. And if/when expediting RCU, Peter's other suggestion to not do it in
>> cpus_write_lock() and instead do it from cpuhp_smt_enable() also makes sense
>> to me.
>> ---8<-----------------------
>> From: Joel Fernandes 
>> Subject: [PATCH] cpuhp: Optimize batch SMT enable by reducing lock acquiring
>> Bulk CPU hotplug operations such as enabling SMT across all cores
>> require hotplugging multiple CPUs. The current implementation takes
>> cpus_write_lock() for each individual CPU causing multiple slow grace
>> period requests.
>> Therefore introduce cpu_up_locked() that assumes the caller already
>> holds cpus_write_lock(). The cpuhp_smt_enable() function is updated to
>> hold the lock once around the entire loop rather than for each CPU.
>> Link: https://lore.kernel.org/ all/20260113090153.GS830755@...sy.programming.kicks-ass.net/
>> Suggested-by: Peter Zijlstra 
>> Signed-off-by: Joel Fernandes 
>> ---
>>  kernel/cpu.c | 40 +++++++++++++++++++++++++---------------
>>  1 file changed, 25 insertions(+), 15 deletions(-)
>> diff --git a/kernel/cpu.c b/kernel/cpu.c
>> index 8df2d773fe3b..4ce7deb236d7 100644
>> --- a/kernel/cpu.c
>> +++ b/kernel/cpu.c
>> @@ -1623,34 +1623,31 @@ void cpuhp_online_idle(enum cpuhp_state state)
>>      complete_ap_thread(st, true);
>>  }
>> -/* Requires cpu_add_remove_lock to be held */
>> -static int _cpu_up(unsigned int cpu, int tasks_frozen, enum cpuhp_state target)
>> +/* Requires cpu_add_remove_lock and cpus_write_lock to be held. */
>> +static int cpu_up_locked(unsigned int cpu, int tasks_frozen,
>> +             enum cpuhp_state target)
>>  {
>>      struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
>>      struct task_struct *idle;
>>      int ret = 0;
>> -    cpus_write_lock();
>> +    lockdep_assert_cpus_held();
>> -    if (!cpu_present(cpu)) {
>> -        ret = -EINVAL;
>> -        goto out;
>> -    }
>> +    if (!cpu_present(cpu))
>> +        return -EINVAL;
>>      /*
>>       * The caller of cpu_up() might have raced with another
>>       * caller. Nothing to do.
>>       */
>>      if (st->state >= target)
>> -        goto out;
>> +        return 0;
>>      if (st->state == CPUHP_OFFLINE) {
>>          /* Let it fail before we try to bring the cpu up */
>>          idle = idle_thread_get(cpu);
>> -        if (IS_ERR(idle)) {
>> -            ret = PTR_ERR(idle);
>> -            goto out;
>> -        }
>> +        if (IS_ERR(idle))
>> +            return PTR_ERR(idle);
>>          /*
>>           * Reset stale stack state from the last time this CPU was online.
>> @@ -1673,7 +1670,7 @@ static int _cpu_up(unsigned int cpu, int tasks_frozen, enum cpuhp_state target)
>>           * return the error code..
>>           */
>>          if (ret)
>> -            goto out;
>> +            return ret;
>>      }
>>      /*
>> @@ -1683,7 +1680,16 @@ static int _cpu_up(unsigned int cpu, int tasks_frozen, enum cpuhp_state target)
>>       */
>>      target = min((int)target, CPUHP_BRINGUP_CPU);
>>      ret = cpuhp_up_callbacks(cpu, st, target);
>> -out:
>> +    return ret;
>> +}
>> +
>> +/* Requires cpu_add_remove_lock to be held */
>> +static int _cpu_up(unsigned int cpu, int tasks_frozen, enum cpuhp_state target)
>> +{
>> +    int ret;
>> +
>> +    cpus_write_lock();
>> +    ret = cpu_up_locked(cpu, tasks_frozen, target);
>>      cpus_write_unlock();
>>      arch_smt_update();
>>      return ret;
>> @@ -2715,6 +2721,8 @@ int cpuhp_smt_enable(void)
>>      int cpu, ret = 0;
>>      cpu_maps_update_begin();
>> +    /* Hold cpus_write_lock() for entire batch operation. */
>> +    cpus_write_lock();
>>      cpu_smt_control = CPU_SMT_ENABLED;
>>      for_each_present_cpu(cpu) {
>>          /* Skip online CPUs and CPUs on offline nodes */
>> @@ -2722,12 +2730,14 @@ int cpuhp_smt_enable(void)
>>              continue;
>>          if (!cpu_smt_thread_allowed(cpu) || ! topology_is_core_online(cpu))
>>              continue;
>> -        ret = _cpu_up(cpu, 0, CPUHP_ONLINE);
>> +        ret = cpu_up_locked(cpu, 0, CPUHP_ONLINE);
>>          if (ret)
>>              break;
>>          /* See comment in cpuhp_smt_disable() */
>>          cpuhp_online_cpu_device(cpu);
>>      }
>> +    cpus_write_unlock();
>> +    arch_smt_update();
>>      cpu_maps_update_done();
>>      return ret;
>>  }
> 
> What about cpuhp_smt_disable?

Doing this is not that easy on disable path, AFAICS. Considering that the enable
path in the performance tests was much worse, I wanted to contain it to that.

This does not have to be the only fix though, but one of the cures to get there.

Thanks.