[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <l4v4gqv36nwdhvc2jgyvonnusakkd7kfrrecetlxsv74agemin@txdp7tzwnwpp>
Date: Tue, 6 Jan 2026 11:21:59 -0600
From: Bjorn Andersson <andersson@...nel.org>
To: Manivannan Sadhasivam <mani@...nel.org>
Cc: Manivannan Sadhasivam <manivannan.sadhasivam@....qualcomm.com>,
Ulf Hansson <ulf.hansson@...aro.org>, konradybcio@...nel.org, linux-arm-msm@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 0/1] soc: qcom: rpmh-rsc: Register s2idle_ops to
indicate s2ram behavior in s2idle
On Mon, Jan 05, 2026 at 10:27:41PM +0530, Manivannan Sadhasivam wrote:
> On Mon, Jan 05, 2026 at 08:04:54AM -0600, Bjorn Andersson wrote:
> > On Thu, Jan 01, 2026 at 10:21:57PM +0530, Manivannan Sadhasivam wrote:
> > > Hi,
> > >
> >
> > Many thanks for looking into this problem, Mani.
> >
> > > This is just an attempt to let the device drivers know of the quirky platform
> > > behavior of mimicking s2ram in s2idle for older Qcom SoCs (pre-Hamoa and
> > > non-chromebooks) using RPMh. This information is important for the device
> > > drivers as they need to prepare for the possible power loss during system
> > > suspend by shutting down or resetting the devices.
> > >
> >
> > Is this really what happens? What _exactly_ is the purpose of
> > pm_suspend_via_firmware() and pm_resume_via_firmware()?
> >
> > The kernel-doc for pm_suspend_via_firmware() states "return true if
> > platform firmware is going to be invoked at end of system suspend...in
> > order to complete it".
> >
> > To me this indicates that the purpose of this mechanism is to signal
> > Linux that we're running on a target where platform firmware will cut
> > power once we reach the bottom of the suspend chain, so the individual
> > drivers has to save/restore state in order to prepare for this.
> >
>
> Yes!
>
But that's not the case here. In fact, iirc even on Makena the PCIe
controller is retained, it's just the link that can't be sustained
without CX - but I might misremember.
> > But if I understand the bottom of the s2idle sequence, is that all we do
> > is enter idle and the kernel will pick the lowest idle idle-state, which
> > based on the psci-suspend-param will signal the system that the
> > resources needed by the CPU subsystem can be powered down.
> >
> > But it's not really powering things down, it's releasing the vote of the
> > CPU subsystem - which if the driver didn't vote means power will be
> > lost. So for any drivers that relies on the implicit vote from the CPU
> > subsystem to sustain their operation this flag is correct.
> >
> > For many IP blocks, the register state and/or some functionality will be
> > retained in this state - e.g. register state will be retained, or
> > functionality will be sustained by sleep power. So they can safely
> > ignore the flag.
> >
> > In some cases state isn't retained when this happens, so e.g. PHY
> > drivers does perform save and restore operations, despite the flag
> > currently not indicating that power will be lost.
> >
> >
> > What's unique with Makena in this regard, is that in this state the PCIe
> > link can not be sustained, so we either need to tear things down, or the
> > PCIe controller needs to vote directly on those resources it need in
> > order to sustain its link.
> >
> > It seems to me that this patch uses the global flag, in order to signal
> > the PCIe stack specifically, that it needs to save/restore state. I'm
> > concerned that support for retaining state in the PCIe subsystem isn't a
> > global question.
> >
>
> No, this is not a PCIe stack limitation, but PCIe client driver limitation
> like NVMe. PCIe stack does save and restore the device config space during
> system suspend (even during runtime suspend sometimes).
>
> But the NVMe driver will only shutdown the controller if this global flag is set
> or other conditions are satisfied. Otherwise, it will just park the device in
> low power state. So if the target enters s2ram (deep sleep) or CXPC in Makena,
> controller context will be lost, leading to resume failure.
>
So, you're saying that this is the only PCIe client driver that needs
this special handling?
> We tried to fix the issue in the NVMe driver so far, but all efforts ended up in
> vain. NVMe maintainers preferred to rely on some PCI/PM APIs for this. Initially
> I looked into creating one such API, but then figured out that I can just make
> use of this global flag and not touch the NVMe driver at all.
>
At the same time acpi_storage_d3() tells me that this problem is already
accepted by the community, we just don't have a way to signal the same
in our system.
> This worked well for s2ram, so I resent a patch submitted by Konrad in 2024 [1].
> But for s2idle and Makena, I thought I could reuse the same global flag and
> achieve the same net result.
>
My problem remains that we're using the global "power to all IP-blocks
will be lost"-flag to say "on Makena, the PCIe controller doesn't retain
state in low power mode", to a single driver.
And in particular, when taking DeepSleep into consideration, there's
going to be a lot of work performed by drivers when "power to all
IP-blocks will be lost".
> [1] https://lore.kernel.org/all/20251231162126.7728-1-manivannan.sadhasivam@oss.qualcomm.com
>
> >
> >
> > Then parallel to this discussion, we have the actual s2ram feature
> > showing up on some targets, where the SoC actually do power off. Here
> > the automatic retaining of register and functional state is not going to
> > happen - device drivers must save and restore state.
> >
>
> AFAIK, atleast on compute targets, s2ram == s2idle + CXPC. Only difference is
> that the system wide low power mode transition will happen in PSCI
> SYSTEM_SUSPEND phase as opposed to CPU_SUSPEND.
>
But this means that our normal assumptions about retention holds, except
for PCIe on Makena.
> Only on auto targets, we have the 'deep sleep' feature which behaves like x86
> s2ram i.e., turning off power to most of the components. But from Linux POV,
> both are just s2ram. This is one more motivation for me to use this global flag
> for s2ram.
>
Not only on auto targets, the DeepSleep feature exists in IoT and other
places as well.
As I said in my other reply, we can decide that DeepSleep == s2ram from
Linux PoV, but that has implications - beyond what I think you're
willing to accept on your laptop.
> Oh there is one more limitation with doing CXPC without D3Cold (link retention).
> Linux cannot handle PCIe wakeup reliably. So if you have WLAN or USB keyboard
> connected to a PCIe-USB hub, wakeup will not work and will result in PCIe link
> down with upstream. We need to have a hw mechanism to handle PCIe wakeup. But
> that is not currently supported and I don't know when that support will land.
>
> So if wakeup is not supported, I thought doing D3Cold + CXPC would be better in
> terms of power saving (not much difference, but still...). This is the only
> motivation for me to use this flag for all pre-Hamoa SoCs.
>
I'm glad that you're looking into this part, we want that.
> > Do we force all "pre-Hamoa" targets into this same behavior? Or do we
> > have a different flag for saying "at the end of suspend power will be
> > lost" there?
> >
>
> If you don't agree with the above reasoning, I can just limit its usage to
> Makena and s2ram.
>
I don't think I have all the details, but if we're saying that Makena is
broken and need special treatment in NVMe, it would be better to say
just that, with a patch in check_vendor_combination_bug()
/* Qualcomm SC8280XP can not enter low-power with PCIe link active */
if (of_machine_is_compatible("qcom,sc8280xp"))
return NVME_QUIRK_SIMPLE_SUSPEND;
Alternatively, make it possible to set StorageD3Enable in DeviceTree.
This flag exists in ACPI for a reason, why wouldn't we see the same
problems when describing the system using DeviceTree?
> >
> > PS. Think the commit message of the change itself falls short in
> > capturing the problem. The commit message of this change will become the
> > documentation on which many future discussions will be based.
> >
>
> Sorry about that. I will add more context while sending non-RFC version.
>
> > > For implementation, s2idle_ops is registered during the boot of the rpmh-rsc
> > > driver based on the machine compatible limited to Makena (SC8280XP) as a
> > > proof-of-concept. If this approach gets consensus, I plan to have a helper
> > > that lists the compatibles of all SoCs exhibiting this behavior. Since there is
> > > no reliable way to find out whether s2idle is the only low power state supported
> > > or not during boot, I resorted to compatible based matching.
> > >
> > > One could argue that this s2idle_ops should be registered in the PSCI driver
> > > similar to s2ram [1]. But I didn't prefer that since from PSCI point of view,
> > > only CPUs should be parked in low power states during s2idle (CPU_SUSPEND) and
> > > the peripherals should not be affected. Though in the past, an argument [2] was
> > > raised citing the PSCI spec wording that allows the vendors to implement system
> > > level low power states during CPU_SUSPEND. But that argument was not well
> > > received by the PSCI maintainers.
> > >
> >
> > It makes sense to me to have PSCI only manage CPUs and its necessary
> > peripherals.
> >
> > But the definition of pm_suspend_via_firmware() in our case would then
> > be, "if your driver piggybacks on e.g. the CPU subsystem's vote for
> > critical resources, you might be in for a surprise".
> >
> > Which makes me feel that the Makena quirk should be in the PCIe
> > controller driver...
> >
>
> Do you still think so?
>
Yes, not based on the current state of the kernel. But all the
implications that will follow, in particular in relation to DeepSleep.
Regards,
Bjorn
> - Mani
>
> --
> மணிவண்ணன் சதாசிவம்
Powered by blists - more mailing lists