lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aWE7wy4tyLsnEdXc@linaro.org>
Date: Fri, 9 Jan 2026 18:33:18 +0100
From: Stephan Gerhold <stephan.gerhold@...aro.org>
To: Bjorn Andersson <andersson@...nel.org>
Cc: Manivannan Sadhasivam <manivannan.sadhasivam@....qualcomm.com>,
	Konrad Dybcio <konrad.dybcio@....qualcomm.com>,
	Krishna Chaitanya Chundru <krishna.chundru@....qualcomm.com>,
	Michael Turquette <mturquette@...libre.com>,
	Stephen Boyd <sboyd@...nel.org>, Taniya Das <quic_tdas@...cinc.com>,
	Konrad Dybcio <konradybcio@...nel.org>,
	Bartosz Golaszewski <brgl@...nel.org>,
	Shazad Hussain <quic_shazhuss@...cinc.com>,
	Sibi Sankar <sibi.sankar@....qualcomm.com>,
	Bryan O'Donoghue <bryan.odonoghue@...aro.org>,
	Melody Olvera <quic_molvera@...cinc.com>,
	Dmitry Baryshkov <lumag@...nel.org>,
	Taniya Das <taniya.das@....qualcomm.com>,
	Dmitry Baryshkov <dmitry.baryshkov@....qualcomm.com>,
	Imran Shaik <quic_imrashai@...cinc.com>,
	Abel Vesa <abelvesa@...nel.org>, linux-arm-msm@...r.kernel.org,
	linux-clk@...r.kernel.org, linux-kernel@...r.kernel.org,
	Rajendra Nayak <quic_rjendra@...cinc.com>, stable@...r.kernel.org
Subject: Re: [PATCH 0/7] clk: qcom: gcc: Do not turn off PCIe GDSCs during
 gdsc_disable()

On Fri, Jan 09, 2026 at 09:49:52AM -0600, Bjorn Andersson wrote:
> On Mon, Jan 05, 2026 at 10:47:29AM +0100, Stephan Gerhold wrote:
> > On Mon, Jan 05, 2026 at 10:44:39AM +0530, Manivannan Sadhasivam wrote:
> > > On Fri, Jan 02, 2026 at 02:57:56PM +0100, Konrad Dybcio wrote:
> > > > On 1/2/26 2:19 PM, Krishna Chaitanya Chundru wrote:
> > > > > On 1/2/2026 5:09 PM, Konrad Dybcio wrote:
> > > > >> On 1/2/26 12:36 PM, Krishna Chaitanya Chundru wrote:
> > > > >>> On 1/2/2026 5:04 PM, Konrad Dybcio wrote:
> > > > >>>> On 1/2/26 10:43 AM, Krishna Chaitanya Chundru wrote:
> > > > >>>>> With PWRSTS_OFF_ON, PCIe GDSCs are turned off during gdsc_disable(). This
> > > > >>>>> can happen during scenarios such as system suspend and breaks the resume
> > > > >>>>> of PCIe controllers from suspend.
> > > > >>>> Isn't turning the GDSCs off what we want though? At least during system
> > > > >>>> suspend?
> > > > >>> If we are keeping link in D3cold it makes sense, but currently we are not keeping in D3cold
> > > > >>> so we don't expect them to get off.
> > > > >> Since we seem to be tackling that in parallel, it seems to make sense
> > > > >> that adding a mechanism to let the PCIe driver select "on" vs "ret" vs
> > > > >> "off" could be useful for us
> > > > > At least I am not aware of such API where we can tell genpd not to turn off gdsc
> > > > > at runtime if we are keeping the device in D3cold state.
> > > > > But anyway the PCIe gdsc supports Retention, in that case adding this flag here makes
> > > > > more sense as it represents HW.
> > > > > sm8450,sm8650 also had similar problem which are fixed by mani[1].
> > > > 
> > > > Perhaps I should ask for a clarification - is retention superior to
> > > > powering the GDSC off? Does it have any power costs?
> > > > 
> > > 
> > > In terms of power saving it is not superior, but that's not the only factor we
> > > should consider here. If we keep GDSCs PWRSTS_OFF_ON, then the devices (PCIe)
> > > need to be be in D3Cold. Sure we can change that using the new genpd API
> > > dev_pm_genpd_rpm_always_on() dynamically, but I would prefer to avoid doing
> > > that.
> > > 
> > > In my POV, GDSCs default state should be retention, so that the GDSCs will stay
> > > ON if the rentention is not entered in hw and enter retention otherwise. This
> > > requires no extra modification in the genpd client drivers. One more benefit is,
> > > the hw can enter low power state even when the device is not in D3Cold state
> > > i.e., during s2idle (provided we unvote other resources).
> > > 
> > 
> > What about PCIe instances that are completely unused? The boot firmware
> > on X1E for example is notorious for powering on completely unused PCIe
> > links and powering them down in some half-baked off state (the &pcie3
> > instance, in particular). I'm not sure if the GDSC remains on, but if it
> > does then the unused PD cleanup would also only put them in retention
> > state. I can't think of a good reason to keep those on at all.
> > 
> 
> Conceptually I agree, but do we have any data indicating that there's
> practical benefit to this complication?
> 

No, I also suggested this only from the conceptual perspective. It would
be interesting to test this, but unfortunately I don't have a suitable
device for testing this anymore.

> > The implementation of PWRSTS_RET_ON essentially makes the PD power_off()
> > callback a no-op. Everything in Linux (sysfs, debugfs, ...) will tell
> > you that the power domain has been shut down, but at the end it will
> > remain fully powered until you manage to reach a retention state for the
> > parent power domain. Due to other consumers, that will likely happen
> > only if you reach VDDmin or some equivalent SoC-wide low-power state,
> > something barely any (or none?) of the platforms supported upstream is
> > capable of today.
> > 
> 
> Yes, PWRSTS_RET_ON effectively means that Linux has "dropped its vote"
> on the GDSC and its parents. But with the caveat that we assume when
> going to ON again some state will have been retained.
> 
> > PWRSTS_RET_ON is actually pretty close to setting GENPD_FLAG_ALWAYS_ON,
> > the only advantage of PWRSTS_RET_ON I can think of is that unused GDSCs
> > remain off iff you are lucky enough that the boot firmware has not
> > already turned them on.
> > 
> 
> Doesn't GENPD_FLAG_ALWAYS_ON imply that the parent will also be always
> on?
> 

It probably does, but isn't that exactly what you want? If the parent
(or the GDSC itself) would actually *power off* (as in "pull the plug"),
then you would still lose registers even if the GDSC remains on. The
fact that PWRSTS_RET_ON works without keeping the parent on is probably
just because the hardware keeps the parent domain always-on?

> > IMHO, for GDSCs that support OFF state in the hardware, PWRSTS_RET_ON is
> > a hack to workaround limitations in the consumer drivers. They should
> > either save/restore registers and handle the power collapse or they
> > should vote for the power domain to stay on. That way, sysfs/debugfs
> > will show the real votes held by Linux and you won't be mislead when
> > looking at those while trying to optimize power consumption.
> > 
> 
> No, it's not working around limitations in the consumer drivers.
> 
> It does work around a limitation in the API, in that the consumer
> drivers can't indicate in which cases they would be willing to restore
> and in which cases they would prefer retention. This is something the
> downstream solution has had, but we don't have a sensible and generic
> way to provide this.

I might be missing something obvious, but mapping this to the existing
pmdomain API feels pretty straightforward to me:

 - Power on/power off means "pull the plug", i.e. if you vote for a
   pmdomain to power off you should expect that registers get lost.
   That's exactly what will typically happen if the hardware actually
   removes power completely from the power domain.

 - If you want to preserve registers (retention), you need to tell the
   hardware to keep the pmdomain powered on at a minimum voltage
   (= performance state). In fact, the PCIe GDSC already inherits
   support for RPMH_REGULATOR_LEVEL_RETENTION from its parent domain.
   (If RPMH_REGULATOR_LEVEL_RETENTION happens to be higher than the
    rentention state we are talking about here you could also just vote
    for 0 performance state...)

With this, the only additional feature you need from the pmdomain API is
to disable its sometimes inconvenient feature to automatically disable
all pmdomains during system suspend (independent of the votes made by
drivers). I believe this exists already in different forms. Back when
I needed something like this for cpufreq on MSM8909, Ulf suggested using
device_set_awake_path(), see commit d6048a19a710 ("cpufreq: qcom-nvmem:
Preserve PM domain votes in system suspend"). I'm not entirely up to
date what is the best way currently to do this, but letting a driver
preserve its votes across system suspend feels like a common enough
requirement that should be supported by the pmdomain API.

> 
> Keeping GDSCs in retention is a huge gain when it comes to the time it
> takes to resume the system after being in low power. PCIe is a good
> example of this, where the GDSC certainly support entering OFF, at the
> cost of tearing link and all down.
> 

I don't doubt that. My point is that the PCIe driver should make that
decision and not the (semi-)generic power domain driver that does not
exactly know who (or if anyone) is going to consume its power domain.
Especially because this decision is encoded in SoC-specific data and we
had plenty of patches already changing PWRSTS_OFF_ON to PWRSTS_RET_ON
due to suspend issues initially unnoticed on some SoCs (or vice-versa to
save power).

Thanks,
Stephan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ