linux-kernel - Re: [PATCH] cpuhp: Expedite synchronize

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aWUqtX2PGDOZUaDW@milan>
Date: Mon, 12 Jan 2026 18:09:09 +0100
From: Uladzislau Rezki <urezki@...il.com>
To: Joel Fernandes <joelagnelf@...dia.com>
Cc: Uladzislau Rezki <urezki@...il.com>,
	Shrikanth Hegde <sshegde@...ux.ibm.com>,
	Vishal Chourasia <vishalc@...ux.ibm.com>,
	"rcu@...r.kernel.org" <rcu@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"paulmck@...nel.org" <paulmck@...nel.org>,
	"frederic@...nel.org" <frederic@...nel.org>,
	"neeraj.upadhyay@...nel.org" <neeraj.upadhyay@...nel.org>,
	"josh@...htriplett.org" <josh@...htriplett.org>,
	"boqun.feng@...il.com" <boqun.feng@...il.com>,
	"rostedt@...dmis.org" <rostedt@...dmis.org>,
	"tglx@...utronix.de" <tglx@...utronix.de>,
	"peterz@...radead.org" <peterz@...radead.org>,
	"srikar@...ux.ibm.com" <srikar@...ux.ibm.com>
Subject: Re: [PATCH] cpuhp: Expedite synchronize_rcu during CPU hotplug
 operations

On Mon, Jan 12, 2026 at 04:09:49PM +0000, Joel Fernandes wrote:
> 
> 
> > On Jan 12, 2026, at 7:57 AM, Uladzislau Rezki <urezki@...il.com> wrote:
> > 
> > Hello, Shrikanth!
> > 
> >> 
> >>> On 1/12/26 3:38 PM, Uladzislau Rezki wrote:
> >>> On Mon, Jan 12, 2026 at 03:13:33PM +0530, Vishal Chourasia wrote:
> >>>> Bulk CPU hotplug operations—such as switching SMT modes across all
> >>>> cores—require hotplugging multiple CPUs in rapid succession. On large
> >>>> systems, this process takes significant time, increasing as the number
> >>>> of CPUs grows, leading to substantial delays on high-core-count
> >>>> machines. Analysis [1] reveals that the majority of this time is spent
> >>>> waiting for synchronize_rcu().
> >>>> 
> >>>> Expedite synchronize_rcu() during the hotplug path to accelerate the
> >>>> operation. Since CPU hotplug is a user-initiated administrative task,
> >>>> it should complete as quickly as possible.
> >>>> 
> >>>> Performance data on a PPC64 system with 400 CPUs:
> >>>> 
> >>>> + ppc64_cpu --smt=1 (SMT8 to SMT1)
> >>>> Before: real 1m14.792s
> >>>> After:  real 0m03.205s  # ~23x improvement
> >>>> 
> >>>> + ppc64_cpu --smt=8 (SMT1 to SMT8)
> >>>> Before: real 2m27.695s
> >>>> After:  real 0m02.510s  # ~58x improvement
> >>>> 
> >>>> Above numbers were collected on Linux 6.19.0-rc4-00310-g755bc1335e3b
> >>>> 
> >>>> [1] https://lore.kernel.org/all/5f2ab8a44d685701fe36cdaa8042a1aef215d10d.camel@linux.vnet.ibm.com
> >>>> 
> >>> Also you can try: echo 1 > /sys/module/rcutree/parameters/rcu_normal_wake_from_gp
> >>> to speedup regular synchronize_rcu() call. But i am not saying that it would beat
> >>> your "expedited switch" improvement.
> >>> 
> >> 
> >> Hi Uladzislau.
> >> 
> >> Had a discussion on this at LPC, having in kernel solution is likely
> >> better than having it in userspace.
> >> 
> >> - Having it in kernel would make it work across all archs. Why should
> >>  any user wait when one initiates the hotplug.
> >> 
> >> - userspace tools are spread across such as chcpu, ppc64_cpu etc.
> >>  though internally most do "0/1 > /sys/devices/system/cpu/cpuN/online".
> >>  We will have to repeat the same in each tool.
> >> 
> >> - There is already /sys/kernel/rcu_expedited which is better if at all
> >>  we need to fallback to userspace.
> >> 
> > Sounds good to me. I agree it is better to bypass parameters.
> 
> Another way to make it in-kernel would be to make the RCU normal wake from GP optimization enabled for > 16 CPUs by default.
> 
> I was considering this, but I did not bring it up because I did not know that there are large systems that might benefit from it until now.
> 
IMO, we can increase that threshold. 512/1024 is not a problem at all.
But as Paul mentioned, we should consider scalability enhancement. From
the other hand it is also probably worth to get into the state when we
really see them :)

--
Uladzislau Rezki