lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 13 Nov 2023 14:04:34 +0100
From:   Mateusz Majewski <m.majewski2@...sung.com>
To:     linux-pm@...r.kernel.org, linux-samsung-soc@...r.kernel.org,
        linux-kernel@...r.kernel.org
Cc:     Mateusz Majewski <m.majewski2@...sung.com>,
        "Rafael J . Wysocki" <rafael@...nel.org>,
        Daniel Lezcano <daniel.lezcano@...aro.org>,
        Zhang Rui <rui.zhang@...el.com>,
        Lukasz Luba <lukasz.luba@....com>,
        Krzysztof Kozlowski <krzysztof.kozlowski@...aro.org>,
        Bartlomiej Zolnierkiewicz <bzolnier@...il.com>,
        Marek Szyprowski <m.szyprowski@...sung.com>
Subject: Various Exynos targets never return to no cooling

Hi,

While working on some fixes on the Exynos thermal driver, I have found that some
of the Exynos-based boards will never return to no cooling. That is, after
heating the board a bit and letting it cool, we see in the sysfs output similar
to this:

root@...get:~# cat /sys/class/thermal/thermal_zone*/temp
30000
29000
32000
31000
30000
root@...get:~# cat /sys/class/thermal/cooling_device*/cur_state
1
0
0
0

This is on the Odroid XU4 board, where the lowest trip point is 50 deg. C.
Similar behavior happens on some other boards, for example TM2E. The issue
happens when the stepwise governor is being used and I have not tested the
behavior of the other governors.

I have attempted to fix this myself, but the issue seems somewhat complex and
over my level of understanding. I did some debugging, and here is what I think
is happening:

1. Since there is no temperature polling enabled on the mentioned boards, the
   governor will only be called when a trip point is being passed.
2. The board heats up and a couple trip points get passed through. Each time,
   the governor is called for each trip point.
3. For the lowest thermal instance, it will find out that the temperature is
   higher than the lowest trip point (i.e. throttle is true), and that the trend
   is THERMAL_TREND_RAISING. Therefore, it will attempt to increase the target
   state each time and the state will be set to the higher limit.
4. Let's now say that the temperature starts falling, which means that the trip
   points get passed from the other side. Again, the governor will be called for
   each trip point.
5. For the lowest thermal instance, the trend will be THERMAL_TREND_DROPPING. The
   temperature will be higher than the lowest trip point all but one time (i.e.
   throttle will be true). This will mean that in these cases, nothing will
   happen and the state will remain at the higher limit.
6. Finally, when the lowest trip point is passed and the governor is called for
   its thermal instance, the trend will still be THERMAL_TREND_DROPPING and the
   temperature will be lower than the trip point (i.e. throttle will be false).
   Therefore, the governor will reduce the state, but it is unlikely that this
   will result in deactivation of the thermal instance, since the state has been
   at the higher limit up until this point.
7. Now the governor will never be called anymore, and the state will never
   change from this point.

I have found two workarounds, but neither seem satisfactory:

1. The issue doesn't appear when at least two lowest trip points have their
   lower state limit equal to the higher state limit. For instance, for TM2E,
   the following change is enough for the issue to not appear:

diff --git a/arch/arm64/boot/dts/exynos/exynos5433-tmu.dtsi b/arch/arm64/boot/dts/exynos/exynos5433-tmu.dtsi
index 81b72393dd0d..145c4c80893a 100644
--- a/arch/arm64/boot/dts/exynos/exynos5433-tmu.dtsi
+++ b/arch/arm64/boot/dts/exynos/exynos5433-tmu.dtsi
@@ -55,14 +55,14 @@ cooling-maps {
             map0 {
                 /* Set maximum frequency as 1800MHz  */
                 trip = <&atlas0_alert_0>;
-                cooling-device = <&cpu4 1 2>, <&cpu5 1 2>,
-                         <&cpu6 1 2>, <&cpu7 1 2>;
+                cooling-device = <&cpu4 1 1>, <&cpu5 1 1>,
+                         <&cpu6 1 1>, <&cpu7 1 1>;
             };
             map1 {
                 /* Set maximum frequency as 1700MHz  */
                 trip = <&atlas0_alert_1>;
-                cooling-device = <&cpu4 2 3>, <&cpu5 2 3>,
-                         <&cpu6 2 3>, <&cpu7 2 3>;
+                cooling-device = <&cpu4 2 2>, <&cpu5 2 2>,
+                         <&cpu6 2 2>, <&cpu7 2 2>;
             };
             map2 {
                 /* Set maximum frequency as 1600MHz  */
@@ -229,14 +229,14 @@ cooling-maps {
             map0 {
                 /* Set maximum frequency as 1200MHz  */
                 trip = <&apollo_alert_2>;
-                cooling-device = <&cpu0 1 2>, <&cpu1 1 2>,
-                         <&cpu2 1 2>, <&cpu3 1 2>;
+                cooling-device = <&cpu0 1 1>, <&cpu1 1 1>,
+                         <&cpu2 1 1>, <&cpu3 1 1>;
             };
             map1 {
                 /* Set maximum frequency as 1100MHz  */
                 trip = <&apollo_alert_3>;
-                cooling-device = <&cpu0 2 3>, <&cpu1 2 3>,
-                         <&cpu2 2 3>, <&cpu3 2 3>;
+                cooling-device = <&cpu0 2 2>, <&cpu1 2 2>,
+                         <&cpu2 2 2>, <&cpu3 2 2>;
             };
             map2 {
                 /* Set maximum frequency as 1000MHz  */

   Two trip points need to change and not only one, since the calculation in the
   governor is based on the maximum of all states and not only the state of a
   single instance. It's not clear if that would be enough in all cases, but
   this feels hacky anyway. Though since we only give the governor information
   when the trip point is passed, it does make some limited sense to make it
   simply set the state to a specific value instead of making decisions.
2. The issue also disappears when polling is enabled, since this means that the
   governor is called periodically. However, it would be great to not have to do
   so and keep using only interrupts, since we already have them in our SoC.

It seems that in the past, there has been an attempt to handle this case
differently: https://lore.kernel.org/all/1352348786-24255-1-git-send-email-amit.kachhap@linaro.org/
However it seems that the attempt has never been completed, and the remains
have been removed: https://lore.kernel.org/all/20220629151012.3115773-2-daniel.lezcano@linaro.org/

There also might be a race condition possible here, as it might be the case
that after the interrupt, when the thermal framework calls get_temp, the value
will already change to a value that would not trigger the trip point. This
could be problematic when the temperature is raising, as then the governor will
essentially ignore that trip point (trend will be RAISING, but throttle will be
false, so nothing will happen). It is less problematic when the temperature is
falling, as the temperature will be much lower than the trip point due to
hysteresis. However, for the Exynos 5433 SoC, hysteresis is unfortunately set
to 1 deg. C and the temperature values are also rounded to 1 deg. C. This means
that the race condition might also be possible in this direction on this SoC. I
have once managed to get the state stuck at 2 instead of the usual 1 on TM2E. I
have not investigated that further, but it seems that this race condition is a
good explanation of this behavior.

I feel very incompetent to attempt to resolve these issues, as I have only read
the thermal framework code for a bit. What do you think should be done here?

Thank you! :)
Mateusz

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ