lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5f77b5b9-5d95-4512-a9fe-e0745f478d34@arm.com>
Date: Tue, 30 Apr 2024 15:58:25 +0100
From: Lukasz Luba <lukasz.luba@....com>
To: "Rafael J. Wysocki" <rafael@...nel.org>
Cc: "Rafael J. Wysocki" <rjw@...ysocki.net>,
 Daniel Lezcano <daniel.lezcano@...aro.org>,
 Linux PM <linux-pm@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v1] thermal: core: Move passive polling management to the
 core



On 4/30/24 14:48, Rafael J. Wysocki wrote:
> Hi Lukasz,
> 
> On Mon, Apr 29, 2024 at 11:21 PM Lukasz Luba <lukasz.luba@....com> wrote:
>>
>> Hi Rafael,
>>
>> On 4/25/24 15:11, Rafael J. Wysocki wrote:
>>> From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
>>>
>>> Passive polling is enabled by setting the 'passive' field in
>>> struct thermal_zone_device to a positive value so long as the
>>> 'passive_delay_jiffies' field is greater than zero.  It causes
>>> the thermal core to actively check the thermal zone temperature
>>> periodically which in theory should be done after crossing a
>>> passive trip point on the way up in order to allow governors to
>>> react more rapidly to temperature changes and adjust mitigation
>>> more precisely.
>>>
>>> However, the 'passive' field in struct thermal_zone_device is currently
>>> managed by governors which is quite problematic.  First of all, only
>>> two governors, Step-Wise and Power Allocator, update that field at
>>> all, so the other governors do not benefit from passive polling,
>>> although in principle they should.  Moreover, if the zone governor is
>>> changed from, say, Step-Wise to Fair-Share after 'passive' has been
>>> incremented by the former, it is not going to be reset back to zero by
>>> the latter even if the zone temperature falls down below all passive
>>> trip points.
>>>
>>> For this reason, make handle_thermal_trip() increment 'passive'
>>> to enable passive polling for the given thermal zone whenever a
>>> passive trip point is crossed on the way up and decrement it
>>> whenever a passive trip point is crossed on the way down.  Also
>>> remove the 'passive' field updates from governors and additionally
>>> clear it in thermal_zone_device_init() to prevent passive polling
>>> from being enabled after a system resume just beacuse it was enabled
>>> before suspending the system.
>>>
>>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
>>> ---
>>>
>>> This has been mentioned here:
>>>
>>> https://lore.kernel.org/linux-pm/61560bc6-d453-4b0c-a4ea-b375d547b143@linaro.org/
>>>
>>> and I need someone to double check if the Power Allocator governor does not
>>> need to be adjusted more for this change.
>>>
>>> ---
>>>    drivers/thermal/gov_power_allocator.c |   12 +++++++-----
>>>    drivers/thermal/gov_step_wise.c       |   10 ----------
>>>    drivers/thermal/thermal_core.c        |   10 ++++++++--
>>>    3 files changed, 15 insertions(+), 17 deletions(-)
>>>
>>> Index: linux-pm/drivers/thermal/thermal_core.c
>>> ===================================================================
>>> --- linux-pm.orig/drivers/thermal/thermal_core.c
>>> +++ linux-pm/drivers/thermal/thermal_core.c
>>> @@ -389,6 +389,9 @@ static void handle_thermal_trip(struct t
>>>                if (tz->temperature < trip->temperature - trip->hysteresis) {
>>>                        list_add(&td->notify_list_node, way_down_list);
>>>                        td->notify_temp = trip->temperature - trip->hysteresis;
>>> +
>>> +                     if (trip->type == THERMAL_TRIP_PASSIVE)
>>> +                             tz->passive--;
>>
>> This gets negative values and than the core switches to fast 'polling'
>> mode. The values is decremented from 0 each time the
>> thermal_zone_device_enable() is called.
> 
> Interesting.
> 
> This shouldn't happen because it means that the passive trip has been
> crossed on the way down, but it wasn't crossed on the way up.
> 
> It looks like an initialization issue to me.
> 
>> Then IPA is actually called every 100ms after boot w/ low temp,
>> while it should 1s.
>>
>> Please see the logs below:
>> 'short log' after boot
>> ----------------------------------------------
>>
>> [    1.632670] thermal_sys: TZ: tz_id=0 passive-- = -1
>> [    1.637984] thermal_sys: TZ: tz_id=0 passive-- = -2
>> [    1.643641] thermal_sys: TZ: tz_id=1 passive-- = -1
>> ----------------------------------------------
>>
>> long log with call stack dumped
>> ----------------------------------------------
>>
>> [    1.632973] thermal_sys: TZ: tz_id=0 passive-- = -1
>> [    1.638295] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 6.9.0-rc5+ #28
>> [    1.645409] Hardware name: Radxa ROCK 4SE (DT)
>> [    1.650376] Call trace:
>> [    1.653109]  dump_backtrace+0x9c/0x100
>> [    1.657309]  show_stack+0x20/0x38
>> [    1.661017]  dump_stack_lvl+0xc0/0xd0
>> [    1.665119]  dump_stack+0x18/0x28
>> [    1.668828]  __thermal_zone_device_update+0x1fc/0x550
>> [    1.674484]  thermal_zone_device_set_mode+0x64/0xc0
>> [    1.679943]  thermal_zone_device_enable+0x1c/0x30
>> [    1.685206]  thermal_of_zone_register+0x34c/0x590
> 
> So let's see.
> 
> thermal_of_zone_register() calls
> thermal_zone_device_register_with_trips() which calls
> thermal_zone_device_update() for the first time, but
> __thermal_zone_device_update() sees that
> thermal_zone_device_is_enabled() returns false, so it does nothing.
> 
> This is right after thermal_zone_device_init() has been called, so
> tz->temperature == THERMAL_TEMP_INVALID and tz->passive == 0.
> 
> Next, thermal_zone_device_enable() is called by
> thermal_of_zone_register() and it calls __thermal_zone_device_update()
> via thermal_zone_device_set_mode().
> 
> This time thermal_zone_device_is_enabled() returns true, so
> update_temperature() is called and, unless __thermal_zone_get_temp()
> returns an error, it sets tz->last_temperature to THERMAL_TEMP_INVALID
> and tz->temperature to the current zone temperature.
> 
> Next, handle_thermal_trip() is called for all trips and it sees that
> tz->last_temperature == THERMAL_TEMP_INVALID, so it skips the branch
> in which tz->passive is decremented.
> 
> The only case I can see in which something else can happen in when
> __thermal_zone_get_temp() called by update_temperature() returns an
> error code (and if it is -EAGAIN, it won't even trigger a warning
> message) in which case the error is silently discarded and
> __thermal_zone_device_update() happily proceeds with tz->temperature
> == THERMAL_TEMP_INVALID and tz->last_temperature == 0.

That correct.

> 
> This can lead to many surprises down the road, so IMV
> __thermal_zone_device_update() should abort if it sees tz->temperature
> == THERMAL_TEMP_INVALID past calling update_temperature().

agree

> 
> So I'm wondering if the patch below (modulo white-space damage from
> GMail) helps.
> 
>> [    1.690473]  devm_thermal_of_zone_register+0x6c/0xc0
>> [    1.696031]  rockchip_thermal_probe+0x238/0x5e8
>> [    1.701106]  platform_probe+0x70/0xe8
>> [    1.705208]  really_probe+0xc4/0x278
>> [    1.709205]  __driver_probe_device+0x80/0x140
>> [    1.714078]  driver_probe_device+0x48/0x130
>> [    1.718756]  __driver_attach+0x7c/0x138
>> [    1.723045]  bus_for_each_dev+0x80/0xf0
>> [    1.727342]  driver_attach+0x2c/0x40
>> [    1.731340]  bus_add_driver+0xec/0x1f8
>> [    1.735539]  driver_register+0x68/0x138
>> [    1.739828]  __platform_driver_register+0x30/0x48
>> [    1.745093]  rockchip_thermal_driver_init+0x24/0x38
>> [    1.750551]  do_one_initcall+0x50/0x2d8
>> [    1.754844]  kernel_init_freeable+0x204/0x440
>> [    1.759722]  kernel_init+0x28/0x140
>> [    1.763631]  ret_from_fork+0x10/0x20
>> [    1.767802] thermal_sys: TZ: tz_id=0 passive-- = -2
> 
> ---
>   drivers/thermal/thermal_core.c |    3 +++
>   1 file changed, 3 insertions(+)
> 
> Index: linux-pm/drivers/thermal/thermal_core.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_core.c
> +++ linux-pm/drivers/thermal/thermal_core.c
> @@ -496,6 +496,9 @@ void __thermal_zone_device_update(struct
> 
>       update_temperature(tz);
> 
> +    if (tz->temperature == THERMAL_TEMP_INVALID)
> +        return;
> +
>       tz->notify_event = event;
> 
>       for_each_trip_desc(tz, td)

Yes, it prevents that previous situation. I have added a debug print
before your return in the code above and the it's in the log:

[    1.632520] thermal_sys: TZ: THERMAL_TEMP_INVALID, return
[    1.638899] thermal_sys: TZ: THERMAL_TEMP_INVALID, return

The tracing also shows that we have only 1s slow polling.
It also works properly in IPA when crossing 2 trip points
and coming back to low temp.

So, that code above should be OK. Thanks!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ