[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <72add0ff-db2a-44f8-b4af-0b5fe5f65f20@oss.qualcomm.com>
Date: Thu, 5 Dec 2024 21:34:38 +0100
From: Konrad Dybcio <konrad.dybcio@....qualcomm.com>
To: Ulf Hansson <ulf.hansson@...aro.org>,
        Konrad Dybcio <konradybcio@...nel.org>
Cc: Rob Herring <robh@...nel.org>, Krzysztof Kozlowski <krzk+dt@...nel.org>,
        Conor Dooley <conor+dt@...nel.org>,
        Lorenzo Pieralisi
 <lpieralisi@...nel.org>,
        Mark Rutland <mark.rutland@....com>,
        Marijn Suijten <marijn.suijten@...ainline.org>,
        devicetree@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-arm-kernel@...ts.infradead.org,
        Bjorn Andersson <bjorn.andersson@....qualcomm.com>,
        Konrad Dybcio <konrad.dybcio@....qualcomm.com>,
        Sudeep Holla <sudeep.holla@....com>
Subject: Re: [PATCH 0/3] Allow specifying an S2RAM sleep on pre-SYSTEM_SUSPEND
 PSCI impls
On 14.11.2024 4:30 PM, Ulf Hansson wrote:
> On Mon, 28 Oct 2024 at 15:24, Konrad Dybcio <konradybcio@...nel.org> wrote:
>>
>> Certain firmwares expose exactly what PSCI_SYSTEM_SUSPEND does through
>> CPU_SUSPEND instead. Inform Linux about that.
>> Please see the commit messages for a more detailed explanation.
>>
>> This is effectively a more educated follow-up to [1].
>>
>> The ultimate goal is to stop making Linux think that certain states
>> only concern cores/clusters, and consequently setting
>> pm_set_suspend/resume_via_firmware(), so that client drivers (such as
>> NVMe, see related discussion over at [2]) can make informed decisions
>> about assuming the power state of the device they govern.
> 
> In my opinion, this is not really the correct way to do it. Using
> pm_set_suspend/resume_via_firmware() works fine for x86/ACPI, but not
> for PSCI like this. Let me elaborate. If the NVMe storage device is
> sharing the same power-rail as the CPU cluster, then yes we should use
> PSCI to control it. But is that really the case? If so, there are in
> principle two ways forward to deal with this correctly.
> 
> 1) If PSCI OSI mode is being used, the corresponding NVMe storage
> device should be hooked up to the CPU PM cluster domain via genpd and
> controlled as any other devices sharing the cluster-rail. In this way,
> genpd together with the cpuidle-psci-domain can decide whether it's
> okay to turn off the cluster. I believe this is the preferred way, but
> 2) would work fine too.
> 
> 2) If PSCI PC mode is being used, a separate channel/interface to the
> FW (like SCMI or rpmh in the QC case), should inform the FW whether
> NVMe needs the power to it. This information should then be taken into
> account by the PSCI FW when it decides what low-power-state to enter,
> which ultimately means whether the cluster-rail can be turned off or
> not.
This assumes PSCI only governs the CPU power rail. But what I'd
guesstimate is that in most implementations if system-level suspend is
there at all (no matter through which call), as per the spec, it at
least also projects onto the DDR power state (like in this i.mx
impl here [1]), or some uncore peripherals (like in Tegra's case with
some secure element being toggled at [2])
> Assuming PSCI OSI mode is used here. Then if 1) doesn't work for you,
> please elaborate on why, so we can help to make it work, as it should.
On Qualcomm platforms, RPMh is the central authority when it comes
to power governance, but by design, the CPUs must be off (and with a
specific magic cookie) for the RPMh hardware to consider powering off
very power hungry parts of the system, such as general i/o rails.
So again, PSCI must be fed a specific value for the rest of the hw
to react. The "S2RAM state" isn't really a cpuidle state, because
it doesn't differ from many shallower states as far as the cpu/cluster
are concerned. If that all isn't in place, the platform never actually
enters any "real" sleep state, other than "CPU and some controllable
IP blocks are runtime-suspended".
This effectively is very close to what ACPI+x86 do - there's a
co-processor/firmware that does a lot of things behind your back and
all you can do is *ask* it to change some handwavily-defined P/Cstate
that affects a huge chunk of silicon.
Konrad
[1] https://github.com/nxp-imx/imx-atf/blob/lf_v2.6/plat/imx/imx8m/imx8mp/imx8mp_lpa_psci.c#L474
[2] https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/nvidia/tegra/soc/t210/plat_psci_handlers.c#L214
Powered by blists - more mailing lists
 
