lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Wed, 24 Jan 2024 01:14:51 +0100
From: Daniel Lezcano <daniel.lezcano@...aro.org>
To: Alexey Charkov <alchark@...il.com>
Cc: Rob Herring <robh+dt@...nel.org>,
 Krzysztof Kozlowski <krzysztof.kozlowski+dt@...aro.org>,
 Conor Dooley <conor+dt@...nel.org>, Heiko Stuebner <heiko@...ech.de>,
 Sebastian Reichel <sebastian.reichel@...labora.com>,
 Cristian Ciocaltea <cristian.ciocaltea@...labora.com>,
 Christopher Obbard <chris.obbard@...labora.com>,
 Tamás Szűcs <szucst@....uni-miskolc.hu>,
 Shreeya Patel <shreeya.patel@...labora.com>,
 Kever Yang <kever.yang@...k-chips.com>, Jagan Teki <jagan@...eble.ai>,
 Chris Morgan <macromorgan@...mail.com>, devicetree@...r.kernel.org,
 linux-arm-kernel@...ts.infradead.org, linux-rockchip@...ts.infradead.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] arm64: dts: rockchip: enable built-in thermal
 monitoring on rk3588

On 23/01/2024 20:47, Alexey Charkov wrote:
> On Mon, Jan 22, 2024 at 4:04 AM Daniel Lezcano
> <daniel.lezcano@...aro.org> wrote:
>>
>>
>> Hi Alexey,
>>
>>
>> On 21/01/2024 20:57, Alexey Charkov wrote:
>>> On Fri, Jan 19, 2024 at 8:21 PM Daniel Lezcano
>>> <daniel.lezcano@...aro.org> wrote:
>>> Hello Daniel,
>>>
>>> Thanks a lot for your review and comments! Please see some reflections below.
>>>
>>>> On 09/01/2024 20:19, Alexey Charkov wrote:
>>>>> Include thermal zones information in device tree for rk3588 variants
>>>>> and enable the built-in thermal sensing ADC on RADXA Rock 5B
>>>>>
>>>>> Signed-off-by: Alexey Charkov <alchark@...il.com>
>>>>> ---
>>>>> Changes in v2:
>>>>>     - Dropped redundant comments
>>>>>     - Included all CPU cores in cooling maps
>>>>>     - Split cooling maps into more granular ones utilizing TSADC
>>>>>       channels 1-3 which measure temperature by separate CPU clusters
>>>>>       instead of channel 0 which measures the center of the SoC die
>>>>> ---
>>>>>     .../boot/dts/rockchip/rk3588-rock-5b.dts      |   4 +
>>>>>     arch/arm64/boot/dts/rockchip/rk3588s.dtsi     | 151 ++++++++++++++++++
>>>>>     2 files changed, 155 insertions(+)
>>>>>
>>>>> diff --git a/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts b/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
>>>>> index a5a104131403..f9d540000de3 100644
>>>>> --- a/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
>>>>> +++ b/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
>>>>> @@ -772,3 +772,7 @@ &usb_host1_ehci {
>>>>>     &usb_host1_ohci {
>>>>>         status = "okay";
>>>>>     };
>>>>> +
>>>>> +&tsadc {
>>>>> +     status = "okay";
>>>>> +};
>>>>> diff --git a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
>>>>> index 8aa0499f9b03..8d54998d0ecc 100644
>>>>> --- a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
>>>>> +++ b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
>>>>> @@ -10,6 +10,7 @@
>>>>>     #include <dt-bindings/reset/rockchip,rk3588-cru.h>
>>>>>     #include <dt-bindings/phy/phy.h>
>>>>>     #include <dt-bindings/ata/ahci.h>
>>>>> +#include <dt-bindings/thermal/thermal.h>
>>>>>
>>>>>     / {
>>>>>         compatible = "rockchip,rk3588";
>>>>> @@ -2112,6 +2113,156 @@ tsadc: tsadc@...00000 {
>>>>>                 status = "disabled";
>>>>>         };
>>>>>
>>>>> +     thermal_zones: thermal-zones {
>>>>> +             /* sensor near the center of the whole chip */
>>>>> +             soc_thermal: soc-thermal {
>>>>> +                     polling-delay-passive = <20>;
>>>>
>>>> There is no mitigation set for this thermal zone. It is pointless to
>>>> specify a passive polling.
>>>
>>> Indeed, it makes sense to me. There seems to be a catch though in that
>>> the driver calls the generic thermal_of_zone_register during the
>>> initial probe, which expects both of those polling delays to be
>>> present in the device tree, otherwise it simply refuses to add the
>>> respective thermal zone, see drivers/thermal/thermal_of.c:502
>>
>> Usually:
>>
>> polling-delay-passive = <0>;
>> polling-delay = <0>;
>>
>> cf:
>>
>> git grep "polling-delay = <0>" arch/arm64/boot/dts
> 
> For some reason when I have both polling-delay-passive and
> polling-delay set to 0, the active cooling map I have in my board DT
> (using a PWM controlled fan) behaves weirdly.



> I use the following fragment in my board DTS:
> 
> +&package_thermal {
> +       trips {
> +               package_fan: package-fan {
> +                       temperature = <55000>;
> +                       hysteresis = <2000>;
> +                       type = "active";
> +               };
> +       };
> +
> +       cooling-maps {
> +               map-fan {
> +                       trip = <&package_fan>;
> +                       cooling-device = <&fan THERMAL_NO_LIMIT
> THERMAL_NO_LIMIT>;
> +               };
> +       };
> +};
> 
> If I add polling-delay = <1000>; at the top, the fan speeds up and
> down dynamically as the package temperature swings around 55C. If I
> remove that (having set polling-delay = <0>; in rk3588s.dtsi), the fan
> speeds up to the midpoint cooling state once the package temperature
> approaches 55C, and then it just stays there forever: it doesn't speed
> up above the midpoint even as the temperature climbs above 70C, nor
> does it spin down as it falls back to around 45C.
> 
> Is that the expected behavior for when the polling is disabled?

I don't know the rest of the DT this fragment was added to, but I'm not 
surprised there is misbehavior because the configuration is not correct 
in this case.

If there is a thermal zone with an active trip and an associated cooling 
device like a fan, then:
	-> polling-delay = <a_value>;
	-> polling-delay-passive = <0>;

If there is a thermal zone with a passive cooling device like cpufreq 
cooling device, then 2 cases:

  1. The sensor supports interrupt when crossing the trip point
	-> polling-delay = <0>;
	-> polling-delay-passive = <a_value>;

  2. The sensor does not support interrupt when crossing the trip point
	-> polling-delay = <a_value>;
	-> polling-delay-passive = <another_value>;

Why?

When the cooling device is a passive cooling device, then the mitigation 
happens with a higher temperature sampling rate in order to change the 
state of the cooling device hundred of times per second. On a fan, the 
cooling effect is too slow for that so we keep the polling for that.


> I haven't yet studied in detail if passive cooling kicks in correctly
> with polling disabled, but this behavior with active cooling left me
> quite confused - any pointers would be much appreciated.
> 
> Thanks a lot,
> Alexey

-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ