[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <55aa69e8b9fb0868091e4b2651a99beb@manjaro.org>
Date: Thu, 25 Jan 2024 23:19:35 +0100
From: Dragan Simic <dsimic@...jaro.org>
To: Alexey Charkov <alchark@...il.com>
Cc: Daniel Lezcano <daniel.lezcano@...aro.org>, Rob Herring
<robh+dt@...nel.org>, Krzysztof Kozlowski
<krzysztof.kozlowski+dt@...aro.org>, Conor Dooley <conor+dt@...nel.org>,
Heiko Stuebner <heiko@...ech.de>, devicetree@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org, linux-rockchip@...ts.infradead.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/4] arm64: dts: rockchip: enable built-in thermal
monitoring on rk3588
On 2024-01-25 09:26, Alexey Charkov wrote:
> On Thu, Jan 25, 2024 at 1:56 AM Daniel Lezcano
> <daniel.lezcano@...aro.org> wrote:
>>
>> On 24/01/2024 21:30, Alexey Charkov wrote:
>> > Include thermal zones information in device tree for rk3588 variants
>>
>> There is an energy model for the CPUs. But finding out the sustainable
>> power may be a bit tricky. So I suggest to remove everything related
>> to
>> the power allocator in this change and propose a dedicated change with
>> all the power configuration (which includes proper k_p* coefficients
>> to
>> be set from userspace to have a flat mitigation figure).
>>
>> That implies removing the "contribution" properties in this
>> description.
>
> Alright, I'll just drop those "contribution" properties, thanks!
>
>> Some comments below but definitively this version is close to be ok.
>
> Yay! :)
>
>> > Signed-off-by: Alexey Charkov <alchark@...il.com>
>> > ---
>> > arch/arm64/boot/dts/rockchip/rk3588s.dtsi | 165 ++++++++++++++++++++++++++++++
>> > 1 file changed, 165 insertions(+)
>> >
>> > diff --git a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
>> > index 36b1b7acfe6a..131b9eb21398 100644
>> > --- a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
>> > +++ b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
>> > @@ -10,6 +10,7 @@
>> > #include <dt-bindings/reset/rockchip,rk3588-cru.h>
>> > #include <dt-bindings/phy/phy.h>
>> > #include <dt-bindings/ata/ahci.h>
>> > +#include <dt-bindings/thermal/thermal.h>
>> >
>> > / {
>> > compatible = "rockchip,rk3588";
>> > @@ -2228,6 +2229,170 @@ tsadc: tsadc@...00000 {
>> > status = "disabled";
>> > };
>> >
>> > + thermal_zones: thermal-zones {
>> > + /* sensor near the center of the whole chip */
>> > + package_thermal: package-thermal {
>> > + polling-delay-passive = <0>;
>> > + polling-delay = <0>;
>> > + thermal-sensors = <&tsadc 0>;
>> > +
>> > + trips {
>> > + package_crit: package-crit {
>> > + temperature = <115000>;
>> > + hysteresis = <0>;
>> > + type = "critical";
>> > + };
>> > + };
>> > + };
>> > +
>> > + /* sensor between A76 cores 0 and 1 */
>> > + bigcore0_thermal: bigcore0-thermal {
>> > + polling-delay-passive = <20>;
>>
>> 20ms seems very short, is this value on purpose? Or just picked up
>> arbitrarily?
>
> Frankly, I simply used the value that Radxa's downstream DTS sets for
> my board. 100ms seem to work just as well.
I think that 100 ms should be just fine (and should lower the resulting
CPU usage a bit), especially because the resolution of the TSADCs is
quite low at 1 oC, according to the RK3588 TRM. If the resolution were
higher, sampling the temperatures more often might make more sense.
>> If it is possible, perhaps you should profile the temperature of these
>> thermal zones (CPUs ones). There is a tool in
>> <linuxdir>/tools/thermal/thermometer to do that.
>>
>> You can measure with 10ms sampling rate when running for instance
>> dhrystone pinned on b0 and b1, then on b2 and b3. And finally on the
>> small cluster.
>
> It seems tricky to isolate the effects from just one of the CPU
> clusters, as their individual thermal outputs are not that high.
>
> For my testing I disabled the fan (but didn't remove the heatsink to
> avoid wasting the thermal interface tape), and tried loading CPUs with
> stress-ng. Here are the observations:
> - Little cores alone stressed with 4 threads pegged to them with
> taskset never reach the throttling temperature (85C), and balance out
> around 60C
> - Either of the big core clusters stressed individually with 2
> threads pegged to them with taskset never reach the throttling
> temperature either
> - Four big cores with 4 threads heat up very slowly (>30 minutes to
> reach throttling temperature, I didn't have enough patience to let
> them actually reach it - maybe they never do)
> - Eight cores with 8 threads heat up to the throttling temperature
> within ~5 minutes (again, with the fan off), and then, as soon as just
> one of the big core clusters gets throttled, the temperature of all
> cores balances out just below the throttling threshold. In my
> observation cores 6,7 go from 2.4GHz down to 1.8GHz while the rest
> stay at their respective top performance states (2.4GHz for big cores
> 4,5 and 1.8GHz for little cores 0-3)
>
> Adding to it the fact that the temperature measurement resolution is
> not very granular (almost 1C) it's somewhat difficult to estimate how
> fast throttling action on a single cluster really brings its
> temperature within bounds, as they all affect each other at relevant
> temperature-load combinations. Perhaps it means that too granular
> polling doesn't add much value.
>
>> But if you don't have spare time and 20 is ok for you. Then it is fine
>> for me too.
>
> I guess I'll go for 100 as other upstream Rockchip .dtsi's do, given
> all of the above. Thanks for pointing this out!
>
>> Some nits below.
>>
>> > + polling-delay = <0>;
>> > + thermal-sensors = <&tsadc 1>;
>> > +
>> > + trips {
>> > + bigcore0_alert0: bigcore0-alert0 {
>> > + temperature = <75000>;
>> > + hysteresis = <2000>;
>> > + type = "passive";
>> > + };
>> > + bigcore0_alert1: bigcore0-alert1 {
>> > + temperature = <85000>;
>> > + hysteresis = <2000>;
>> > + type = "passive";
>> > + };
>> > + bigcore0_crit: bigcore0-crit {
>> > + temperature = <115000>;
>> > + hysteresis = <0>;
>> > + type = "critical";
>> > + };
>> > + };
>> > + cooling-maps {
>> > + map0 {
>> > + trip = <&bigcore0_alert1>;
>> > + cooling-device =
>> > + <&cpu_b0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
>> > + <&cpu_b1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
>> > + contribution = <1024>;
>> > + };
>> > + };
>> > + };
>> > +
>> > + /* sensor between A76 cores 2 and 3 */
>> > + bigcore2_thermal: bigcore2-thermal {
>> > + polling-delay-passive = <20>;
>> > + polling-delay = <0>;
>> > + thermal-sensors = <&tsadc 2>;
>> > +
>> > + trips {
>> > + bigcore2_alert0: bigcore2-alert0 {
>> > + temperature = <75000>;
>> > + hysteresis = <2000>;
>> > + type = "passive";
>> > + };
>> > + bigcore2_alert1: bigcore2-alert1 {
>> > + temperature = <85000>;
>> > + hysteresis = <2000>;
>> > + type = "passive";
>> > + };
>> > + bigcore2_crit: bigcore2-crit {
>> > + temperature = <115000>;
>> > + hysteresis = <0>;
>> > + type = "critical";
>> > + };
>> > + };
>> > + cooling-maps {
>> > + map1 {
>>
>> s/map1/mpa0/
>
> Noted, thanks!
>
>> > + trip = <&bigcore2_alert1>;
>> > + cooling-device =
>> > + <&cpu_b2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
>> > + <&cpu_b3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
>> > + contribution = <1024>;
>> > + };
>> > + };
>> > + };
>> > +
>> > + /* sensor between the four A55 cores */
>> > + little_core_thermal: littlecore-thermal {
>> > + polling-delay-passive = <20>;
>> > + polling-delay = <0>;
>> > + thermal-sensors = <&tsadc 3>;
>> > +
>> > + trips {
>> > + littlecore_alert0: littlecore-alert0 {
>> > + temperature = <75000>;
>> > + hysteresis = <2000>;
>> > + type = "passive";
>> > + };
>> > + littlecore_alert1: littlecore-alert1 {
>> > + temperature = <85000>;
>> > + hysteresis = <2000>;
>> > + type = "passive";
>> > + };
>> > + littlecore_crit: littlecore-crit {
>> > + temperature = <115000>;
>> > + hysteresis = <0>;
>> > + type = "critical";
>> > + };
>> > + };
>> > + cooling-maps {
>> > + map2 {
>>
>> s/map2/map0/
>
> Noted, thanks!
Powered by blists - more mailing lists