lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 8 Jan 2024 17:41:48 +0400
From: Alexey Charkov <alchark@...il.com>
To: Dragan Simic <dsimic@...jaro.org>
Cc: Rob Herring <robh+dt@...nel.org>, 
	Krzysztof Kozlowski <krzysztof.kozlowski+dt@...aro.org>, Conor Dooley <conor+dt@...nel.org>, 
	Heiko Stuebner <heiko@...ech.de>, Sebastian Reichel <sebastian.reichel@...labora.com>, 
	Cristian Ciocaltea <cristian.ciocaltea@...labora.com>, 
	Christopher Obbard <chris.obbard@...labora.com>, Tamás Szűcs <szucst@....uni-miskolc.hu>, 
	Shreeya Patel <shreeya.patel@...labora.com>, Kever Yang <kever.yang@...k-chips.com>, 
	Chris Morgan <macromorgan@...mail.com>, devicetree@...r.kernel.org, 
	linux-arm-kernel@...ts.infradead.org, linux-rockchip@...ts.infradead.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] arm64: dts: rockchip: enable built-in thermal monitoring
 on rk3588

Hello Dragan,

Thanks a lot for your review and comments! Some reflections below.

On Sun, Jan 7, 2024 at 2:54 AM Dragan Simic <dsimic@...jaro.org> wrote:
>
> Hello Alexey,
>
> Please see my comments below.
>
> On 2024-01-06 23:23, Alexey Charkov wrote:
> > Include thermal zones information in device tree for rk3588 variants
> > and enable the built-in thermal sensing ADC on RADXA Rock 5B
> >
> > Signed-off-by: Alexey Charkov <alchark@...il.com>
> > ---
> >  .../boot/dts/rockchip/rk3588-rock-5b.dts      |   4 +
> >  arch/arm64/boot/dts/rockchip/rk3588s.dtsi     | 143 ++++++++++++++++++
> >  2 files changed, 147 insertions(+)
> >
> > diff --git a/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
> > b/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
> > index a5a104131403..f9d540000de3 100644
> > --- a/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
> > +++ b/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
> > @@ -772,3 +772,7 @@ &usb_host1_ehci {
> >  &usb_host1_ohci {
> >       status = "okay";
> >  };
> > +
> > +&tsadc {
> > +     status = "okay";
> > +};
> > diff --git a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> > b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> > index 8aa0499f9b03..8235991e3112 100644
> > --- a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> > +++ b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> > @@ -10,6 +10,7 @@
> >  #include <dt-bindings/reset/rockchip,rk3588-cru.h>
> >  #include <dt-bindings/phy/phy.h>
> >  #include <dt-bindings/ata/ahci.h>
> > +#include <dt-bindings/thermal/thermal.h>
> >
> >  / {
> >       compatible = "rockchip,rk3588";
> > @@ -2112,6 +2113,148 @@ tsadc: tsadc@...00000 {
> >               status = "disabled";
> >       };
> >
> > +     thermal_zones: thermal-zones {
> > +             soc_thermal: soc-thermal {
>
> It should be better to name it cpu_thermal instead.  In the end, that's
> what it is.

The TRM document says the first TSADC channel (to which this section
applies) measures the temperature near the center of the SoC die,
which implies not only the CPU but also the GPU at least. RADXA's
kernel for Rock 5B also has GPU passive cooling as one of the cooling
maps under this node (not included here, as we don't have the GPU node
in .dtsi just yet). So perhaps naming this one cpu_thermal could be
misleading?

> > +                     polling-delay-passive = <20>; /* milliseconds */
> > +                     polling-delay = <1000>; /* milliseconds */
> > +                     sustainable-power = <2100>; /* milliwatts */
>
> These three comments above are pretty much redundant.

Noted, will drop in v2

> > +
> > +                     thermal-sensors = <&tsadc 0>;
>
> An empty line should be added here.

Noted, will add in v2

> > +                     trips {
> > +                             threshold: trip-point-0 {
>
> It should be better to name it cpu_alert0 instead, because that's what
> other newer dtsi files already use.

Reflecting on your comments here and below, I'm thinking that maybe it
would be better to define only the critical trip point for the SoC
overall, and then have alerts along with the respective cooling maps
separately for A76-0,1, A76-2,3, A55-0,1,2,3? After all, given that we
have more granular temperature measurement here than in previous RK
chipsets it might be better to only throttle the "offending" cores,
not the full package.

What do you think?

Downstream DT doesn't follow this approach though, so maybe there's
something I'm missing here.

> > +                                     temperature = <75000>;
> > +                                     hysteresis = <2000>;
> > +                                     type = "passive";
> > +                             };
> > +                             target: trip-point-1 {
>
> It should be better to name it cpu_alert1 instead, because that's what
> other newer dtsi files already use.
>
> > +                                     temperature = <85000>;
> > +                                     hysteresis = <2000>;
> > +                                     type = "passive";
> > +                             };
> > +                             soc_crit: soc-crit {
>
> It should be better to name it cpu_crit instead, because that's what
> other newer dtsi files already use.

Seems to me that if I define separate trips for the three groups of
CPU cores as mentioned above this would better stay as soc_crit, as it
applies to the whole die rather than the CPU cluster alone. Then
'threshold' and 'target' will go altogether, and I'll have separate
*_alert0 and *_alert1 per CPU group.

>> > +                                     /* millicelsius */
> > +                                     temperature = <115000>;
> > +                                     /* millicelsius */
>
> These two comments above are pretty much redundant.  It also applies to
> all other similar comments below.

Noted, thanks - will drop all the noise in v2.

> > +                                     hysteresis = <2000>;
> > +                                     type = "critical";
> > +                             };
> > +                     };
> > +                     cooling-maps {
> > +                             map0 {
> > +                                     trip = <&target>;
>
> Shouldn't &threshold (i.e. &cpu_alert0) be referenced here instead?
>
> > +                                     cooling-device = <&cpu_l0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
>
> Shouldn't all big CPU cores be listed here instead?

I guess if a separate trip point is defined for cpu_l0,1,2,3 then it
would need to throttle at 75C, and then cpu_b0,1 and cpu_b2,3 at 85C
each. Logic being that if a sensor stacked in the middle of a group of
four cores shows 75C then one of the four might well be in abnormal
temperature range already (85+), while sensors next to only two big
cores each will likely show temperatures similar to the actual core
temperature.

> > +                                     contribution = <1024>;
> > +                             };
> > +                             map1 {
> > +                                     trip = <&target>;
>
> Shouldn't &target (i.e. &cpu_alert1) be referenced here instead?
>
> > +                                     cooling-device = <&cpu_b0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
>
> Shouldn't all little and big CPU cores be listed here instead?
>
> > +                                     contribution = <1024>;
> > +                             };
> > +                             map2 {
> > +                                     trip = <&target>;
> > +                                     cooling-device = <&cpu_b2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
> > +                                     contribution = <1024>;
> > +                             };
>
> Isn't this cooling map redundant?

Will refactor the above, thanks.

> > +                     };
> > +             };
> > +
> > +             bigcore0_thermal: bigcore0-thermal {
> > +                     polling-delay-passive = <20>; /* milliseconds */
> > +                     polling-delay = <1000>; /* milliseconds */
> > +                     thermal-sensors = <&tsadc 1>;
> > +
> > +                     trips {
> > +                             big0_crit: big0-crit {
> > +                                     /* millicelsius */
> > +                                     temperature = <115000>;
> > +                                     /* millicelsius */
> > +                                     hysteresis = <2000>;
> > +                                     type = "critical";
> > +                             };
> > +                     };
> > +             };
> > +
> > +             bigcore1_thermal: bigcore1-thermal {
> > +                     polling-delay-passive = <20>; /* milliseconds */
> > +                     polling-delay = <1000>; /* milliseconds */
> > +                     thermal-sensors = <&tsadc 2>;
> > +
> > +                     trips {
> > +                             big1_crit: big1-crit {
> > +                                     /* millicelsius */
> > +                                     temperature = <115000>;
> > +                                     /* millicelsius */
> > +                                     hysteresis = <2000>;
> > +                                     type = "critical";
> > +                             };
> > +                     };
> > +             };
> > +
> > +             little_core_thermal: littlecore-thermal {
> > +                     polling-delay-passive = <20>; /* milliseconds */
> > +                     polling-delay = <1000>; /* milliseconds */
> > +                     thermal-sensors = <&tsadc 3>;
> > +
> > +                     trips {
> > +                             little_crit: little-crit {
> > +                                     /* millicelsius */
> > +                                     temperature = <115000>;
> > +                                     /* millicelsius */
> > +                                     hysteresis = <2000>;
> > +                                     type = "critical";
> > +                             };
> > +                     };
> > +             };
> > +
> > +             center_thermal: center-thermal {
> > +                     polling-delay-passive = <20>; /* milliseconds */
> > +                     polling-delay = <1000>; /* milliseconds */
> > +                     thermal-sensors = <&tsadc 4>;
> > +
> > +                     trips {
> > +                             center_crit: center-crit {
> > +                                     /* millicelsius */
> > +                                     temperature = <115000>;
> > +                                     /* millicelsius */
> > +                                     hysteresis = <2000>;
> > +                                     type = "critical";
> > +                             };
> > +                     };
> > +             };
> > +
> > +             gpu_thermal: gpu-thermal {
> > +                     polling-delay-passive = <20>; /* milliseconds */
> > +                     polling-delay = <1000>; /* milliseconds */
> > +                     thermal-sensors = <&tsadc 5>;
> > +
> > +                     trips {
> > +                             gpu_crit: gpu-crit {
> > +                                     /* millicelsius */
> > +                                     temperature = <115000>;
> > +                                     /* millicelsius */
> > +                                     hysteresis = <2000>;
> > +                                     type = "critical";
> > +                             };
> > +                     };
> > +             };
> > +
> > +             npu_thermal: npu-thermal {
> > +                     polling-delay-passive = <20>; /* milliseconds */
> > +                     polling-delay = <1000>; /* milliseconds */
> > +                     thermal-sensors = <&tsadc 6>;
> > +
> > +                     trips {
> > +                             npu_crit: npu-crit {
> > +                                     /* millicelsius */
> > +                                     temperature = <115000>;
> > +                                     /* millicelsius */
> > +                                     hysteresis = <2000>;
> > +                                     type = "critical";
> > +                             };
> > +                     };
> > +             };
> > +     };
> > +
> >       saradc: adc@...10000 {
> >               compatible = "rockchip,rk3588-saradc";
> >               reg = <0x0 0xfec10000 0x0 0x10000>;

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ