lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 25 Jul 2022 00:39:09 +0200
From:   Niklas Söderlund 
        <niklas.soderlund@...natech.se>
To:     Daniel Lezcano <daniel.lezcano@...exp.org>
Cc:     daniel.lezcano@...aro.org, rafael@...nel.org, rui.zhang@...el.com,
        khilman@...libre.com, abailon@...libre.com, amitk@...nel.org,
        linux-kernel@...r.kernel.org, linux-pm@...r.kernel.org,
        "open list:RENESAS R-CAR THERMAL DRIVERS" 
        <linux-renesas-soc@...r.kernel.org>
Subject: Re: [PATCH v1 17/33] thermal/drivers/rcar: Switch to new of API

Hi Daniel,

I tested your branch, unfortunately with the same result for 
rcar_gen3_thermal. Manipulation of emul_temp file do not trigger 
actions.

If I on-top of your branch revert:

    409ca214f4c6bd5b ("thermal/of: Remove old OF code")
    7b43f76d3428227e ("thermal/drivers/rcar: Switch to new of API")

I'm able to 'restore' the behavior where I can change the cooling state 
and trigger the critical trip point using emul_temp to shutdown the 
board.

As the change in question also effects the rcar_thermal sensor I gave 
that a try too. It have no cooling on this system I have so my only 
test-case is to write a temperature above the critical trip point to 
emul_temp as see if that shutdown the system.  And just as with 
rcar_gen3_thermal with your series nothing happens while with the two 
commits outline above reverted the system shuts down.

    echo 110000 > /sys/devices/virtual/thermal/thermal_zone0/emul_temp

If it's any help writing to emul_temp have some effect as the emulated 
temperature is read back from the temp sysfs while. For rcar_thermal 
where the critical trip point is 95 degrees,

    * With this series
    # grep . /sys/devices/virtual/thermal/thermal_zone0/trip_point_0_*
    /sys/devices/virtual/thermal/thermal_zone0/trip_point_0_hyst:0
    /sys/devices/virtual/thermal/thermal_zone0/trip_point_0_temp:95000
    /sys/devices/virtual/thermal/thermal_zone0/trip_point_0_type:critical
    # cat /sys/devices/virtual/thermal/thermal_zone0/temp
    35000
    # echo 50000 > /sys/devices/virtual/thermal/thermal_zone0/emul_temp
    # cat /sys/devices/virtual/thermal/thermal_zone0/temp
    50000
    # echo 110000 > /sys/devices/virtual/thermal/thermal_zone0/emul_temp
    # cat /sys/devices/virtual/thermal/thermal_zone0/temp
    110000
    *** system alive ***

    * With this series and the two patches reverted or plain v5.19-rc4
    # grep .  /sys/devices/virtual/thermal/thermal_zone0/trip_point_0_* 
    /sys/devices/virtual/thermal/thermal_zone0/trip_point_0_hyst:0
    /sys/devices/virtual/thermal/thermal_zone0/trip_point_0_temp:95000
    /sys/devices/virtual/thermal/thermal_zone0/trip_point_0_type:critical
    # cat /sys/devices/virtual/thermal/thermal_zone0/temp
    35000
    # echo 50000 > /sys/devices/virtual/thermal/thermal_zone0/emul_temp
    # cat /sys/devices/virtual/thermal/thermal_zone0/temp
    50000
    # echo 110000 > /sys/devices/virtual/thermal/thermal_zone0/emul_temp
    [  121.380054] thermal thermal_zone0: cpu-thermal: critical temperature reached, shutting down
    [  121.388482] reboot: HARDWARE PROTECTION shutdown (Temperature too high)
    *** system shuts down ***

And to make it more problematic I don't think the lack of action is 
limited to the emul_temp interface. With rcar_thermal I lowered the 
critical trip point value to 45C and used the cpuburn application to 
generate load and raise the temperature.

The result mirrors the findings above, with your branch the system do 
not trigger the critical trip point. If I revert the two commits or run 
plain v5.19-rc4, once the temperature reaches 45C the critical trip 
point kicks in and shuts down the system.

I hope this helps, I'm sorry I can't find the real issue diging in the 
core changes. I'm happy to help trying to find the root cause for this 
and I think the idea behind the new API is good.

On 2022-07-24 23:11:47 +0200, Daniel Lezcano wrote:
> 
> Hi Niklas,
> 
> I give another try but failed to reproduce the issue. Perhaps my board has a
> path different from yours.
> 
> Thanks for proposing to test the series. I've uploaded the branch here:
> 
> https://github.com/dlezcano/linux-thermal
> 
> 
> On 24/07/2022 21:00, Niklas Söderlund wrote:
> > Hi Daniel,
> > 
> > On 2022-07-24 20:27:54 +0200, Daniel Lezcano wrote:
> > > Hi Niklas,
> > > 
> > > I tried to reproduce the issue but without success.
> > > 
> > > What sensor are you using ?
> > I was using rcar_gen3_thermal.
> > 
> > I did my tests starting on v5.19-rc7 and then picked '[PATCH v5 00/12]
> > thermal OF rework' from [1] and finally applied this full series on-top
> > of that. If you have a branch or some specific test you wish me to try
> > I'm happy to so.
> > 
> > 1. https://lore.kernel.org/lkml/20220710123512.1714714-1-daniel.lezcano@linexp.org/
> > 
> > > 
> > > On 19/07/2022 11:10, Niklas Söderlund wrote:
> > > > Hi Daniel,
> > > > 
> > > > Thanks for your work.
> > > > 
> > > > On 2022-07-10 23:24:07 +0200, Daniel Lezcano wrote:
> > > > > The thermal OF code has a new API allowing to migrate the OF
> > > > > initialization to a simpler approach.
> > > > > 
> > > > > Use this new API.
> > > > I tested this together with the series it depends on and while
> > > > temperature monitoring seems to work fine it breaks the emul_temp
> > > > interface (/sys/class/thermal/thermal_zone2/emul_temp).
> > > > 
> > > > Before this change I can write a temperature to this file and have it
> > > > trigger actions, in my test-case changing the cooling state, which I
> > > > observe in /sys/class/thermal/cooling_device0/cur_state.
> > > > 
> > > > Likewise before this change I could trip the critical trip-point that
> > > > would power off the board using the emul_temp interface, this too no
> > > > longer works,
> > > > 
> > > >       echo 120000 > /sys/class/thermal/thermal_zone2/emul_temp
> > > > 
> > > > Is this an intention change of the new API?
> > > 
> > > 
> > > 
> 

-- 
Kind Regards,
Niklas Söderlund

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ