[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAD=FV=XzTSxg9sAtUcDhoLnY736u1qGKJy4OwLKp56_ruSUUvQ@mail.gmail.com>
Date: Thu, 17 Feb 2022 08:50:10 -0800
From: Doug Anderson <dianders@...omium.org>
To: Lukasz Luba <lukasz.luba@....com>
Cc: Matthias Kaehlcke <mka@...omium.org>,
LKML <linux-kernel@...r.kernel.org>,
Linux PM <linux-pm@...r.kernel.org>,
amit daniel kachhap <amit.kachhap@...il.com>,
Daniel Lezcano <daniel.lezcano@...aro.org>,
Viresh Kumar <viresh.kumar@...aro.org>,
"Rafael J. Wysocki" <rafael@...nel.org>,
Amit Kucheria <amitk@...nel.org>,
Zhang Rui <rui.zhang@...el.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Pierre.Gondois@....com, Stephen Boyd <swboyd@...omium.org>,
Rajendra Nayak <rnayak@...eaurora.org>,
Bjorn Andersson <bjorn.andersson@...aro.org>
Subject: Re: [PATCH 1/2] thermal: cooling: Check Energy Model type in
cpufreq_cooling and devfreq_cooling
Hi,
On Wed, Feb 16, 2022 at 3:28 PM Lukasz Luba <lukasz.luba@....com> wrote:
>
> On 2/16/22 5:21 PM, Doug Anderson wrote:
> > Hi,
> >
> > On Tue, Feb 8, 2022 at 1:32 AM Lukasz Luba <lukasz.luba@....com> wrote:
> >>
> >>> Another important thing is the consistent scale of the power values
> >>> provided by the cooling devices. All of the cooling devices in a single
> >>> thermal zone should have power values reported either in milli-Watts
> >>> or scaled to the same 'abstract scale'.
> >>
> >> This can change. We have removed the userspace governor from kernel
> >> recently. The trend is to implement thermal policy in FW. Dealing with
> >> some intermediate configurations are causing complicated design, support
> >> of the algorithm logic is also more complex.
> >
> > One thing that didn't get addressed is the whole "The trend is to
> > implement thermal policy in FW". I'm not sure I can get on board with
> > that trend. IMO "moving to FW" isn't a super great trend. FW is harder
> > to update than kernel and trying to keep it in sync with the kernel
> > isn't wonderful. Unless something _has_ to be in FW I personally
> > prefer it to be in the kernel.
>
> There are pros and cons for both approaches (as always).
>
> Although, there are some use cases, where the kernel is not able to
> react that fast, e.g. sudden power usage changes, which can cause
> that the power rail is not able to sustain within required conditions.
> When we are talking about tough requirements for those power & thermal
> policies, the mechanism must be fast, precised and reliable.
>
> Here you can find Arm reference FW implementation and an IPA clone
> in there (I have been reviewing this) [1][2].
>
> As you can see there is a new FW feature set:
> "MPMM, Traffic-cop and Thermal management".
>
> Apart from Arm implementation, there are already known thermal
> monitoring mechanisms in HW/FW. Like in the new Qcom SoCs which
> are using this driver code [3]. The driver receives an interrupt
> about throttling conditions and just populates the thermal pressure.
Yeah, this has come up in another context recently too. Right on on
the Qcom SoCs I'm working with (sc7180 on Chromebooks) we've
essentially disabled all the HW/FW throttling (LMH), preferring to let
Linux manage things. We chose to do it this way with the assumption
that Linux would be able to make better decisions than the firmware
and it was easier to understand / update than an opaque
vendor-provided blob. LMH is still there with super high limits in
case Linux goofs up (we don't want to damage the CPU) but it's not the
primary means of throttling.
As you said, Linux reacts a bit slower, though I've heard that might
be fixed soon-ish? So far on sc7180 Chromebooks it hasn't been a
problem because we have more total thermal mass and the CPUs in sc7180
don't actually generate that much heat compared to other CPUs. We also
have thermal interrupts enabled, which helps. That being said,
improvements are certainly welcome!
> > ...although now that I re-read this, I'm not sure which firmware you
> > might be talking about. Is this the AP firmware, or some companion
> > chip / coprocessor? Even so, I'd still rather see things done in the
> > kernel when possible...
>
> It's a FW run on a dedicated microprocessor. In Arm SoCs it's usually
> some Cortex-M. We communicated with it from the kernel via SCMI drivers
> (using shared memory and mailboxes). We recommend to use the SCMI
> protocol to send e.g. 'performance request' to the FW via 'fast
> channel' instead of having an implementation of PMIC and clock, and do
> the voltage & freq change in the kernel (using drivers & locking). That
> implementation allows to avoid costly locking and allows to go via
> that SCMI cpufreq driver [4] and SCMI perf layer [5] the task scheduler.
> We don't need a dedicated 'sugov' kthread in a Deadline policy to
> do that work and preempt the currently running task.
>
> IMHO the FW approach opens new opportunities.
>
> Regards,
> Lukasz
>
> [1] https://github.com/ARM-software/SCP-firmware/pull/588
> [2]
> https://github.com/ARM-software/SCP-firmware/pull/588/commits/59c62ead5eb66353ae805c367bfa86192e28c410
> [3]
> https://elixir.bootlin.com/linux/v5.17-rc4/source/drivers/cpufreq/qcom-cpufreq-hw.c#L287
> [4]
> https://elixir.bootlin.com/linux/latest/source/drivers/cpufreq/scmi-cpufreq.c#L65
> [5]
> https://elixir.bootlin.com/linux/v5.17-rc4/source/drivers/firmware/arm_scmi/perf.c#L465
Powered by blists - more mailing lists