[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c970cf8d-a174-4c10-85ca-00f66056a621@oss.qualcomm.com>
Date: Thu, 19 Dec 2024 20:37:54 +0100
From: Konrad Dybcio <konrad.dybcio@....qualcomm.com>
To: Ulf Hansson <ulf.hansson@...aro.org>,
Konrad Dybcio <konrad.dybcio@....qualcomm.com>,
Maulik Shah <quic_mkshah@...cinc.com>
Cc: Konrad Dybcio <konradybcio@...nel.org>, Rob Herring <robh@...nel.org>,
Krzysztof Kozlowski <krzk+dt@...nel.org>,
Conor Dooley
<conor+dt@...nel.org>,
Lorenzo Pieralisi <lpieralisi@...nel.org>,
Mark Rutland <mark.rutland@....com>,
Marijn Suijten <marijn.suijten@...ainline.org>,
devicetree@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org,
Bjorn Andersson <bjorn.andersson@....qualcomm.com>,
Sudeep Holla <sudeep.holla@....com>,
Vincent Guittot <vincent.guittot@...aro.org>
Subject: Re: [PATCH 0/3] Allow specifying an S2RAM sleep on pre-SYSTEM_SUSPEND
PSCI impls
On 6.12.2024 10:53 AM, Ulf Hansson wrote:
> + Maulik, Vincent
>
> On Thu, 5 Dec 2024 at 21:34, Konrad Dybcio
> <konrad.dybcio@....qualcomm.com> wrote:
>>
>> On 14.11.2024 4:30 PM, Ulf Hansson wrote:
>>> On Mon, 28 Oct 2024 at 15:24, Konrad Dybcio <konradybcio@...nel.org> wrote:
>>>>
>>>> Certain firmwares expose exactly what PSCI_SYSTEM_SUSPEND does through
>>>> CPU_SUSPEND instead. Inform Linux about that.
>>>> Please see the commit messages for a more detailed explanation.
>>>>
>>>> This is effectively a more educated follow-up to [1].
>>>>
>>>> The ultimate goal is to stop making Linux think that certain states
>>>> only concern cores/clusters, and consequently setting
>>>> pm_set_suspend/resume_via_firmware(), so that client drivers (such as
>>>> NVMe, see related discussion over at [2]) can make informed decisions
>>>> about assuming the power state of the device they govern.
>>>
>>> In my opinion, this is not really the correct way to do it. Using
>>> pm_set_suspend/resume_via_firmware() works fine for x86/ACPI, but not
>>> for PSCI like this. Let me elaborate. If the NVMe storage device is
>>> sharing the same power-rail as the CPU cluster, then yes we should use
>>> PSCI to control it. But is that really the case? If so, there are in
>>> principle two ways forward to deal with this correctly.
>>>
>>> 1) If PSCI OSI mode is being used, the corresponding NVMe storage
>>> device should be hooked up to the CPU PM cluster domain via genpd and
>>> controlled as any other devices sharing the cluster-rail. In this way,
>>> genpd together with the cpuidle-psci-domain can decide whether it's
>>> okay to turn off the cluster. I believe this is the preferred way, but
>>> 2) would work fine too.
>>>
>>> 2) If PSCI PC mode is being used, a separate channel/interface to the
>>> FW (like SCMI or rpmh in the QC case), should inform the FW whether
>>> NVMe needs the power to it. This information should then be taken into
>>> account by the PSCI FW when it decides what low-power-state to enter,
>>> which ultimately means whether the cluster-rail can be turned off or
>>> not.
>>
>> This assumes PSCI only governs the CPU power rail. But what I'd
>> guesstimate is that in most implementations if system-level suspend is
>> there at all (no matter through which call), as per the spec, it at
>> least also projects onto the DDR power state (like in this i.mx
>> impl here [1]), or some uncore peripherals (like in Tegra's case with
>> some secure element being toggled at [2])
>
> Right, I certainly understand the above. There are different parts of
> an SoC that may be sharing the same power-island as the CPUs.
>
> The question here is whether the NVMe storage device is part of that
> power-island too on some QC SoCs?
Yes, but not exclusively (i.e. there can also be other voltage rails or
similar that may or may not be manged by Linux, depending on the SoC)
>>> Assuming PSCI OSI mode is used here. Then if 1) doesn't work for you,
>>> please elaborate on why, so we can help to make it work, as it should.
>>
>> On Qualcomm platforms, RPMh is the central authority when it comes
>> to power governance, but by design, the CPUs must be off (and with a
>> specific magic cookie) for the RPMh hardware to consider powering off
>> very power hungry parts of the system, such as general i/o rails.
>
> Right, that is why the "qcom,rpmh-rsc" device in many cases belongs to
> the cluster-power-domain (for PSCI). This allows "qcom,rpmh-rsc" to
> control the "last-man" activities and prevent deeper PSCI states
> if/when necessary.
Problem is, today we only describe the RSC connected to the CPU cluster.
Newer SoCs have multiple RSCs, which long story short allow for certain
IP blocks to operate and have their power managed without the CPU block
being involved, or even online.
The CPU RSC can only reliably probe the CPU online status, as all other
IPs can be requested to stay powered from an external entity (e.g. a DSP,
secure world and similar), so the driver can only do its best to try and
prevent obviously-going-to-fail idle entries when CPUs are online.
>> So again, PSCI must be fed a specific value for the rest of the hw
>> to react. The "S2RAM state" isn't really a cpuidle state, because
>> it doesn't differ from many shallower states as far as the cpu/cluster
>> are concerned. If that all isn't in place, the platform never actually
>> enters any "real" sleep state, other than "CPU and some controllable
>> IP blocks are runtime-suspended".
>
> We recently discussed this, offlist, with Maulik - and I think we need
> some more clarity around what is actually going on here.
>
> In principle, it looks to me that using S2I with just another deeper
> idlestate specified (with another psci-suspend-parameter, representing
> a deeper state) should work fine, at least theoretically. Of course,
> we may not be able to use that idlestate during regular
> cpuidle/runtime but only during S2I, which we need to control in a
> smooth way and that is not currently supported (but can be fixed
> easily, I think).
>
> In the end, it's the psci-suspend-parameter that is given to the PSCI
> FW that informs about what state we can enter.
>
> That said, using S2I may not work without updating the PSCI FW, of
> course. For example, there may be FW limitations that require the
> boot-CPU( CPU0) to be the last one for these deeper low-power-states.
> Whether that is just a FW limitation or whether there are some
> additional HW constraints that enforce this, needs to be clarified.
Yeah, not being able to runtime-idle into that state is one issue,
and another one being successfully entering the S2RAM state may
require us to reinitialize some hardware. Currently, Linux has no
way of knowing that state is any different from the rest, but
marking it as S2RAM would allow to check for PM_SUSPEND_MEM vs
PM_SUSPEND_TO_IDLE
>> This effectively is very close to what ACPI+x86 do - there's a
>> co-processor/firmware that does a lot of things behind your back and
>> all you can do is *ask* it to change some handwavily-defined P/Cstate
>> that affects a huge chunk of silicon.
>
> Yep, there are similarities.
>
> However, ACPI is for generic device power management. PSCI requires
> something additional, such as ARM SCMI or QC's rpm/rsc interface.
Right, we're not yet fully there with "for_each_device(fw_shut_down())"
Konrad
>
>>
>> Konrad
>>
>> [1] https://github.com/nxp-imx/imx-atf/blob/lf_v2.6/plat/imx/imx8m/imx8mp/imx8mp_lpa_psci.c#L474
>> [2] https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/nvidia/tegra/soc/t210/plat_psci_handlers.c#L214
>>
>
> Kind regards
> Uffe
Powered by blists - more mailing lists