[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e71b9bc5-42c1-4295-998e-4a9d71d84b25@quicinc.com>
Date: Mon, 5 Jan 2026 16:24:44 +0530
From: Pavan Kondeti <pavan.kondeti@....qualcomm.com>
To: Will Deacon <will@...nel.org>
Cc: Pavan Kondeti <pavan.kondeti@....qualcomm.com>,
Mark Rutland <mark.rutland@....com>,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
linux-arm-msm@...r.kernel.org
Subject: Re: SMP boot issue during system resume
On Fri, Jan 02, 2026 at 03:17:25PM +0000, Will Deacon wrote:
> On Mon, Dec 22, 2025 at 11:30:19AM +0530, Pavan Kondeti wrote:
> > We are seeing a SMP boot issue during system resume when CPUs are brought
> > online via pm_sleep_enable_secondary_cpus()->thaw_secondary_cpus()->_cpu_up()
> > on ARM64.
> >
> > The _cpu_up() sets a global variable
> >
> > secondary_data.task = idle;
> >
> > and wait for the secondary CPU to come online. A 5 second timeout is
> > used here. If at all, the secondary CPU comes online after this timeout,
> > we expect it to loop in kernel via __secondary_too_slow(). However, this
> > depends on secondary_data.task value. Since we are bringing all disabled
> > cores, after timeout, we set this global variable to the next CPU idle
> > task and the late secondary CPU thinks the value is its idle task and
> > does not enter __secondary_too_slow().
> >
> > An earlier attempt [1] to fix similar issue incrased the timout to 5
> > seconds. We could reproduce this issue in Linux guest where vCPU
> > scheduling latency can be higher under heavy load on the host.
> >
> > I would like to seek your inputs on how we can improve the current
> > situation. We would like to avoid __secondary_too_slow() spin even when
> > the CPU comes late. This is probably not a desired behavior for other cases like
> > Linux running bare metal or some guests. Having a Kconfig option or
> > kernel param might help here.
>
> You probably want to use the parallel hotplug machinery (or one of the
> interim steps) for this, as it avoids the global state entirely. I spoke
> about it at KVM forum [1] and I have some old hacks at [2]. I can dust
> those off and post them to the list if you like?
Thanks Will for pointing to your informative talk. I see that your patch
depends on PSCIv0.2 extension to CPU_ON (context argument) [1]. I am not
sure if this suit our immediate needs, but it is good to know that we
have a plan for parallel vCPU hotplug.
I am happy to test if you have any other patches that address /
workaround this problem w/o depending on backend/firmware.
Thanks,
Pavan
[1]
https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/commit/?h=cpu-hotplug&id=6ac1e52f7cdfc2437dbe3ea727bd01df342a0fbc
Powered by blists - more mailing lists