lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 29 Apr 2024 22:21:33 +0100
From: Lukasz Luba <lukasz.luba@....com>
To: "Rafael J. Wysocki" <rjw@...ysocki.net>
Cc: Daniel Lezcano <daniel.lezcano@...aro.org>,
 Linux PM <linux-pm@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>,
 "Rafael J. Wysocki" <rafael@...nel.org>
Subject: Re: [PATCH v1] thermal: core: Move passive polling management to the
 core

Hi Rafael,

On 4/25/24 15:11, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> 
> Passive polling is enabled by setting the 'passive' field in
> struct thermal_zone_device to a positive value so long as the
> 'passive_delay_jiffies' field is greater than zero.  It causes
> the thermal core to actively check the thermal zone temperature
> periodically which in theory should be done after crossing a
> passive trip point on the way up in order to allow governors to
> react more rapidly to temperature changes and adjust mitigation
> more precisely.
> 
> However, the 'passive' field in struct thermal_zone_device is currently
> managed by governors which is quite problematic.  First of all, only
> two governors, Step-Wise and Power Allocator, update that field at
> all, so the other governors do not benefit from passive polling,
> although in principle they should.  Moreover, if the zone governor is
> changed from, say, Step-Wise to Fair-Share after 'passive' has been
> incremented by the former, it is not going to be reset back to zero by
> the latter even if the zone temperature falls down below all passive
> trip points.
> 
> For this reason, make handle_thermal_trip() increment 'passive'
> to enable passive polling for the given thermal zone whenever a
> passive trip point is crossed on the way up and decrement it
> whenever a passive trip point is crossed on the way down.  Also
> remove the 'passive' field updates from governors and additionally
> clear it in thermal_zone_device_init() to prevent passive polling
> from being enabled after a system resume just beacuse it was enabled
> before suspending the system.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> ---
> 
> This has been mentioned here:
> 
> https://lore.kernel.org/linux-pm/61560bc6-d453-4b0c-a4ea-b375d547b143@linaro.org/
> 
> and I need someone to double check if the Power Allocator governor does not
> need to be adjusted more for this change.
> 
> ---
>   drivers/thermal/gov_power_allocator.c |   12 +++++++-----
>   drivers/thermal/gov_step_wise.c       |   10 ----------
>   drivers/thermal/thermal_core.c        |   10 ++++++++--
>   3 files changed, 15 insertions(+), 17 deletions(-)
> 
> Index: linux-pm/drivers/thermal/thermal_core.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_core.c
> +++ linux-pm/drivers/thermal/thermal_core.c
> @@ -389,6 +389,9 @@ static void handle_thermal_trip(struct t
>   		if (tz->temperature < trip->temperature - trip->hysteresis) {
>   			list_add(&td->notify_list_node, way_down_list);
>   			td->notify_temp = trip->temperature - trip->hysteresis;
> +
> +			if (trip->type == THERMAL_TRIP_PASSIVE)
> +				tz->passive--;

This gets negative values and than the core switches to fast 'polling'
mode. The values is decremented from 0 each time the
thermal_zone_device_enable() is called.

Then IPA is actually called every 100ms after boot w/ low temp,
while it should 1s.

Please see the logs below:
'short log' after boot
----------------------------------------------

[    1.632670] thermal_sys: TZ: tz_id=0 passive-- = -1
[    1.637984] thermal_sys: TZ: tz_id=0 passive-- = -2
[    1.643641] thermal_sys: TZ: tz_id=1 passive-- = -1
----------------------------------------------

long log with call stack dumped
----------------------------------------------

[    1.632973] thermal_sys: TZ: tz_id=0 passive-- = -1
[    1.638295] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 6.9.0-rc5+ #28
[    1.645409] Hardware name: Radxa ROCK 4SE (DT)
[    1.650376] Call trace:
[    1.653109]  dump_backtrace+0x9c/0x100
[    1.657309]  show_stack+0x20/0x38
[    1.661017]  dump_stack_lvl+0xc0/0xd0
[    1.665119]  dump_stack+0x18/0x28
[    1.668828]  __thermal_zone_device_update+0x1fc/0x550
[    1.674484]  thermal_zone_device_set_mode+0x64/0xc0
[    1.679943]  thermal_zone_device_enable+0x1c/0x30
[    1.685206]  thermal_of_zone_register+0x34c/0x590
[    1.690473]  devm_thermal_of_zone_register+0x6c/0xc0
[    1.696031]  rockchip_thermal_probe+0x238/0x5e8
[    1.701106]  platform_probe+0x70/0xe8
[    1.705208]  really_probe+0xc4/0x278
[    1.709205]  __driver_probe_device+0x80/0x140
[    1.714078]  driver_probe_device+0x48/0x130
[    1.718756]  __driver_attach+0x7c/0x138
[    1.723045]  bus_for_each_dev+0x80/0xf0
[    1.727342]  driver_attach+0x2c/0x40
[    1.731340]  bus_add_driver+0xec/0x1f8
[    1.735539]  driver_register+0x68/0x138
[    1.739828]  __platform_driver_register+0x30/0x48
[    1.745093]  rockchip_thermal_driver_init+0x24/0x38
[    1.750551]  do_one_initcall+0x50/0x2d8
[    1.754844]  kernel_init_freeable+0x204/0x440
[    1.759722]  kernel_init+0x28/0x140
[    1.763631]  ret_from_fork+0x10/0x20
[    1.767802] thermal_sys: TZ: tz_id=0 passive-- = -2
[    1.773086] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 6.9.0-rc5+ #28
[    1.780196] Hardware name: Radxa ROCK 4SE (DT)
[    1.785162] Call trace:
[    1.787893]  dump_backtrace+0x9c/0x100
[    1.792087]  show_stack+0x20/0x38
[    1.795795]  dump_stack_lvl+0xc0/0xd0
[    1.799895]  dump_stack+0x18/0x28
[    1.803604]  __thermal_zone_device_update+0x1fc/0x550
[    1.809257]  thermal_zone_device_set_mode+0x64/0xc0
[    1.814715]  thermal_zone_device_enable+0x1c/0x30
[    1.819977]  thermal_of_zone_register+0x34c/0x590
[    1.825242]  devm_thermal_of_zone_register+0x6c/0xc0
[    1.830799]  rockchip_thermal_probe+0x238/0x5e8
[    1.835874]  platform_probe+0x70/0xe8
[    1.839973]  really_probe+0xc4/0x278
[    1.843972]  __driver_probe_device+0x80/0x140
[    1.848846]  driver_probe_device+0x48/0x130
[    1.853524]  __driver_attach+0x7c/0x138
[    1.857813]  bus_for_each_dev+0x80/0xf0
[    1.862110]  driver_attach+0x2c/0x40
[    1.866109]  bus_add_driver+0xec/0x1f8
[    1.870307]  driver_register+0x68/0x138
[    1.874597]  __platform_driver_register+0x30/0x48
[    1.879861]  rockchip_thermal_driver_init+0x24/0x38
[    1.885317]  do_one_initcall+0x50/0x2d8
[    1.889609]  kernel_init_freeable+0x204/0x440
[    1.894486]  kernel_init+0x28/0x140
[    1.898394]  ret_from_fork+0x10/0x20
[    1.902879] thermal_sys: TZ: tz_id=1 passive-- = -1
[    1.908172] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 6.9.0-rc5+ #28
[    1.915282] Hardware name: Radxa ROCK 4SE (DT)
[    1.920248] Call trace:
[    1.922979]  dump_backtrace+0x9c/0x100
[    1.927176]  show_stack+0x20/0x38
[    1.930883]  dump_stack_lvl+0xc0/0xd0
[    1.934982]  dump_stack+0x18/0x28
[    1.938691]  __thermal_zone_device_update+0x1fc/0x550
[    1.944342]  thermal_zone_device_set_mode+0x64/0xc0
[    1.949801]  thermal_zone_device_enable+0x1c/0x30
[    1.955063]  thermal_of_zone_register+0x34c/0x590
[    1.960328]  devm_thermal_of_zone_register+0x6c/0xc0
[    1.965886]  rockchip_thermal_probe+0x238/0x5e8
[    1.970951]  platform_probe+0x70/0xe8
[    1.975049]  really_probe+0xc4/0x278
[    1.979039]  __driver_probe_device+0x80/0x140
[    1.983911]  driver_probe_device+0x48/0x130
[    1.988589]  __driver_attach+0x7c/0x138
[    1.992880]  bus_for_each_dev+0x80/0xf0
[    1.997176]  driver_attach+0x2c/0x40
[    2.001173]  bus_add_driver+0xec/0x1f8
[    2.005372]  driver_register+0x68/0x138
[    2.009663]  __platform_driver_register+0x30/0x48
[    2.014927]  rockchip_thermal_driver_init+0x24/0x38
[    2.020383]  do_one_initcall+0x50/0x2d8
[    2.024674]  kernel_init_freeable+0x204/0x440
[    2.029549]  kernel_init+0x28/0x140
[    2.033456]  ret_from_fork+0x10/0x20

------------------------------------------


IMO we should check something like:
----------------------8<--------------------------
if (tz->passive)
	tz->passive--;
--------------------->8---------------------------
because we start from '0' as init value.

>   		} else {
>   			td->threshold -= trip->hysteresis;
>   		}
> @@ -402,8 +405,10 @@ static void handle_thermal_trip(struct t
>   		td->notify_temp = trip->temperature;
>   		td->threshold -= trip->hysteresis;
>   
> -		if (trip->type == THERMAL_TRIP_CRITICAL ||
> -		    trip->type == THERMAL_TRIP_HOT)
> +		if (trip->type == THERMAL_TRIP_PASSIVE)
> +			tz->passive++;

Also, we have to make sure here, that we are align after some change
and be able to return with value to 0 (to switch to 'passive'
slow checks).

Regards,
Lukasz

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ