linux-kernel - Re: [RFC PATCH 0/4] Support for passing runtime state idle time to TF-A

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20210426131150.GA36549@e123083-lin>
Date:   Mon, 26 Apr 2021 15:11:50 +0200
From:   Morten Rasmussen <morten.rasmussen@....com>
To:     Sowjanya Komatineni <skomatineni@...dia.com>
Cc:     Lukasz Luba <lukasz.luba@....com>, sudeep.holla@....com,
        souvik.chakravarty@....com, thierry.reding@...il.com,
        mark.rutland@....com, lorenzo.pieralisi@....com,
        daniel.lezcano@...aro.org, robh+dt@...nel.org,
        jonathanh@...dia.com, ksitaraman@...dia.com, sanjayc@...dia.com,
        linux-arm-kernel@...ts.infradead.org, linux-tegra@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-pm@...r.kernel.org,
        devicetree@...r.kernel.org
Subject: Re: [RFC PATCH 0/4] Support for passing runtime state idle time to
 TF-A

Hi,

On Fri, Apr 23, 2021 at 03:24:51PM -0700, Sowjanya Komatineni wrote:
> On 4/23/21 1:16 PM, Lukasz Luba wrote:
> > Hi Sowjanya,
> > 
> > On 4/22/21 9:30 PM, Sowjanya Komatineni wrote:
> > > Tegra194 and Tegra186 platforms use separate MCE firmware for CPUs
> > > which is
> > > in charge of deciding on state transition based on target state,
> > > state idle
> > > time, and some other Tegra CPU core cluster states information.
> > > 
> > > Current PSCI specification don't have function defined for passing
> > > runtime
> > > state idle time predicted by governor (based on next events and
> > > state target
> > > residency) to ARM trusted firmware.
> > 
> > Do you have some numbers from experiments showing that these idle
> > governor prediction values, which are passed from kernel to MCE
> > firmware, are making a good 'guess'?
> > How much precision (1us? 1ms?) in the values do you need there?
> 
> it could also be in few ms depending on when next cpu event/activity might
> happen which is not transparent to MCE firmware.
> 
> > 
> > IIRC (probably Rafael's presentations) predicting in the kernel
> > something like CPU idle time residency is not a trivial thing.
> > 
> > Another idea (depending on DT structure and PSCI bits):
> > Could this be solved differently, but just having a knowledge that if
> > the governor requested some C-state, this means governor 'predicted'
> > an idle residency to be greater that min_residency attached to this
> > C-state?
> > Then, when that request shows up in your FW, you know that it must be at
> > least min_residency because of this C-state id.
> C6 is the only deepest state for Tegra194 Carmel CPU that we support in
> addition to C1 (WFI) idle state.
> 
> MCE firmware gets state crossover thresholds for C1 to C6 transition from
> TF-A and uses it along with state idle time to decide on C6 state entry
> based on its background work.
> 
> Assuming for now if we use min_residency as state idle time which is static
> value from DT, then it enters into deepest state C6 always as we use
> min_residency value we use is always higher than state crossover threshold.
> 
> But MCE firmware is not aware of when next cpu event can happen to predict
> if next event can take longer than state min_residency time.
> 
> Using min residency in such case is very conservative where MCE firmware
> exits C6 state early where we may not have better power saving.
> 
> But with MCE firmware being aware of when next event can happen it can use
> that to stay in C6 state without early exit for better power savings.
> 
> > It would depend on number of available states, max_residency, scale
> > that you would choose while assigning values from [0, max_residency]
> > to each state.
> > IIRC there can be many state IDs for idle, so it would depend on
> > number of bits encoding this state, and your needs. Example of
> > linear scale:
> > 4-bits encoding idle state and max predicted residency 10msec,
> > that means 10000us / 16 states = 625us/state.
> > The max_residency might be split differently, using different than
> > linear function, to have some rage more precised.
> > 
> > Open question is if these idle states must be all represented
> > in DT, or there is a way of describing a 'set of idle states'
> > automatically.
> We only support C6 state through DT as C6 is the only deepest state for
> Tegra194 carmel CPU. WFI idle state is completely handled by kernel and does
> not require MCE sequences for entry/exit.

I think Lukasz's point is that you can encode the predicted idle time by
having multiple idle_state entries with different min_residency mapping
to the same actual idle-state. So you would several variants of C6 with
different min_residencies and if the OS picks one with longer
min_residency firmware would have a better estimate of the predicted
idle residency.

I'm not convinced it is the right way to work around passing this
information on to firmware. I would rather see an example of how well
this works (best with numbers) and have a proper solution.

Morten