lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aVfhhXHfFE6lzlzp@willie-the-truck>
Date: Fri, 2 Jan 2026 15:17:25 +0000
From: Will Deacon <will@...nel.org>
To: Pavan Kondeti <pavan.kondeti@....qualcomm.com>
Cc: Mark Rutland <mark.rutland@....com>,
	linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
	linux-arm-msm@...r.kernel.org
Subject: Re: SMP boot issue during system resume

On Mon, Dec 22, 2025 at 11:30:19AM +0530, Pavan Kondeti wrote:
> We are seeing a SMP boot issue during system resume when CPUs are brought 
> online via pm_sleep_enable_secondary_cpus()->thaw_secondary_cpus()->_cpu_up() 
> on ARM64.
> 
> The _cpu_up() sets a global variable
> 
> secondary_data.task = idle;
> 
> and wait for the secondary CPU to come online. A 5 second timeout is
> used here. If at all, the secondary CPU comes online after this timeout,
> we expect it to loop in kernel via __secondary_too_slow(). However, this
> depends on secondary_data.task value. Since we are bringing all disabled
> cores, after timeout, we set this global variable to the next CPU idle
> task and the late secondary CPU thinks the value is its idle task and
> does not enter __secondary_too_slow().
> 
> An earlier attempt [1] to fix similar issue incrased the timout to 5
> seconds. We could reproduce this issue in Linux guest where vCPU
> scheduling latency can be higher under heavy load on the host.
> 
> I would like to seek your inputs on how we can improve the current
> situation. We would like to avoid __secondary_too_slow() spin even when
> the CPU comes late. This is probably not a desired behavior for other cases like 
> Linux running bare metal or some guests. Having a Kconfig option or
> kernel param might help here.

You probably want to use the parallel hotplug machinery (or one of the
interim steps) for this, as it avoids the global state entirely. I spoke
about it at KVM forum [1] and I have some old hacks at [2]. I can dust
those off and post them to the list if you like?

Will

[1] https://www.youtube.com/watch?v=Q6kOshnnQuE
[2] https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=cpu-hotplug

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ