[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c29247b8-89d4-404a-b294-81f19720e236@arm.com>
Date: Mon, 29 Apr 2024 22:21:33 +0100
From: Lukasz Luba <lukasz.luba@....com>
To: "Rafael J. Wysocki" <rjw@...ysocki.net>
Cc: Daniel Lezcano <daniel.lezcano@...aro.org>,
Linux PM <linux-pm@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>,
"Rafael J. Wysocki" <rafael@...nel.org>
Subject: Re: [PATCH v1] thermal: core: Move passive polling management to the
core
Hi Rafael,
On 4/25/24 15:11, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
>
> Passive polling is enabled by setting the 'passive' field in
> struct thermal_zone_device to a positive value so long as the
> 'passive_delay_jiffies' field is greater than zero. It causes
> the thermal core to actively check the thermal zone temperature
> periodically which in theory should be done after crossing a
> passive trip point on the way up in order to allow governors to
> react more rapidly to temperature changes and adjust mitigation
> more precisely.
>
> However, the 'passive' field in struct thermal_zone_device is currently
> managed by governors which is quite problematic. First of all, only
> two governors, Step-Wise and Power Allocator, update that field at
> all, so the other governors do not benefit from passive polling,
> although in principle they should. Moreover, if the zone governor is
> changed from, say, Step-Wise to Fair-Share after 'passive' has been
> incremented by the former, it is not going to be reset back to zero by
> the latter even if the zone temperature falls down below all passive
> trip points.
>
> For this reason, make handle_thermal_trip() increment 'passive'
> to enable passive polling for the given thermal zone whenever a
> passive trip point is crossed on the way up and decrement it
> whenever a passive trip point is crossed on the way down. Also
> remove the 'passive' field updates from governors and additionally
> clear it in thermal_zone_device_init() to prevent passive polling
> from being enabled after a system resume just beacuse it was enabled
> before suspending the system.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> ---
>
> This has been mentioned here:
>
> https://lore.kernel.org/linux-pm/61560bc6-d453-4b0c-a4ea-b375d547b143@linaro.org/
>
> and I need someone to double check if the Power Allocator governor does not
> need to be adjusted more for this change.
>
> ---
> drivers/thermal/gov_power_allocator.c | 12 +++++++-----
> drivers/thermal/gov_step_wise.c | 10 ----------
> drivers/thermal/thermal_core.c | 10 ++++++++--
> 3 files changed, 15 insertions(+), 17 deletions(-)
>
> Index: linux-pm/drivers/thermal/thermal_core.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_core.c
> +++ linux-pm/drivers/thermal/thermal_core.c
> @@ -389,6 +389,9 @@ static void handle_thermal_trip(struct t
> if (tz->temperature < trip->temperature - trip->hysteresis) {
> list_add(&td->notify_list_node, way_down_list);
> td->notify_temp = trip->temperature - trip->hysteresis;
> +
> + if (trip->type == THERMAL_TRIP_PASSIVE)
> + tz->passive--;
This gets negative values and than the core switches to fast 'polling'
mode. The values is decremented from 0 each time the
thermal_zone_device_enable() is called.
Then IPA is actually called every 100ms after boot w/ low temp,
while it should 1s.
Please see the logs below:
'short log' after boot
----------------------------------------------
[ 1.632670] thermal_sys: TZ: tz_id=0 passive-- = -1
[ 1.637984] thermal_sys: TZ: tz_id=0 passive-- = -2
[ 1.643641] thermal_sys: TZ: tz_id=1 passive-- = -1
----------------------------------------------
long log with call stack dumped
----------------------------------------------
[ 1.632973] thermal_sys: TZ: tz_id=0 passive-- = -1
[ 1.638295] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 6.9.0-rc5+ #28
[ 1.645409] Hardware name: Radxa ROCK 4SE (DT)
[ 1.650376] Call trace:
[ 1.653109] dump_backtrace+0x9c/0x100
[ 1.657309] show_stack+0x20/0x38
[ 1.661017] dump_stack_lvl+0xc0/0xd0
[ 1.665119] dump_stack+0x18/0x28
[ 1.668828] __thermal_zone_device_update+0x1fc/0x550
[ 1.674484] thermal_zone_device_set_mode+0x64/0xc0
[ 1.679943] thermal_zone_device_enable+0x1c/0x30
[ 1.685206] thermal_of_zone_register+0x34c/0x590
[ 1.690473] devm_thermal_of_zone_register+0x6c/0xc0
[ 1.696031] rockchip_thermal_probe+0x238/0x5e8
[ 1.701106] platform_probe+0x70/0xe8
[ 1.705208] really_probe+0xc4/0x278
[ 1.709205] __driver_probe_device+0x80/0x140
[ 1.714078] driver_probe_device+0x48/0x130
[ 1.718756] __driver_attach+0x7c/0x138
[ 1.723045] bus_for_each_dev+0x80/0xf0
[ 1.727342] driver_attach+0x2c/0x40
[ 1.731340] bus_add_driver+0xec/0x1f8
[ 1.735539] driver_register+0x68/0x138
[ 1.739828] __platform_driver_register+0x30/0x48
[ 1.745093] rockchip_thermal_driver_init+0x24/0x38
[ 1.750551] do_one_initcall+0x50/0x2d8
[ 1.754844] kernel_init_freeable+0x204/0x440
[ 1.759722] kernel_init+0x28/0x140
[ 1.763631] ret_from_fork+0x10/0x20
[ 1.767802] thermal_sys: TZ: tz_id=0 passive-- = -2
[ 1.773086] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 6.9.0-rc5+ #28
[ 1.780196] Hardware name: Radxa ROCK 4SE (DT)
[ 1.785162] Call trace:
[ 1.787893] dump_backtrace+0x9c/0x100
[ 1.792087] show_stack+0x20/0x38
[ 1.795795] dump_stack_lvl+0xc0/0xd0
[ 1.799895] dump_stack+0x18/0x28
[ 1.803604] __thermal_zone_device_update+0x1fc/0x550
[ 1.809257] thermal_zone_device_set_mode+0x64/0xc0
[ 1.814715] thermal_zone_device_enable+0x1c/0x30
[ 1.819977] thermal_of_zone_register+0x34c/0x590
[ 1.825242] devm_thermal_of_zone_register+0x6c/0xc0
[ 1.830799] rockchip_thermal_probe+0x238/0x5e8
[ 1.835874] platform_probe+0x70/0xe8
[ 1.839973] really_probe+0xc4/0x278
[ 1.843972] __driver_probe_device+0x80/0x140
[ 1.848846] driver_probe_device+0x48/0x130
[ 1.853524] __driver_attach+0x7c/0x138
[ 1.857813] bus_for_each_dev+0x80/0xf0
[ 1.862110] driver_attach+0x2c/0x40
[ 1.866109] bus_add_driver+0xec/0x1f8
[ 1.870307] driver_register+0x68/0x138
[ 1.874597] __platform_driver_register+0x30/0x48
[ 1.879861] rockchip_thermal_driver_init+0x24/0x38
[ 1.885317] do_one_initcall+0x50/0x2d8
[ 1.889609] kernel_init_freeable+0x204/0x440
[ 1.894486] kernel_init+0x28/0x140
[ 1.898394] ret_from_fork+0x10/0x20
[ 1.902879] thermal_sys: TZ: tz_id=1 passive-- = -1
[ 1.908172] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 6.9.0-rc5+ #28
[ 1.915282] Hardware name: Radxa ROCK 4SE (DT)
[ 1.920248] Call trace:
[ 1.922979] dump_backtrace+0x9c/0x100
[ 1.927176] show_stack+0x20/0x38
[ 1.930883] dump_stack_lvl+0xc0/0xd0
[ 1.934982] dump_stack+0x18/0x28
[ 1.938691] __thermal_zone_device_update+0x1fc/0x550
[ 1.944342] thermal_zone_device_set_mode+0x64/0xc0
[ 1.949801] thermal_zone_device_enable+0x1c/0x30
[ 1.955063] thermal_of_zone_register+0x34c/0x590
[ 1.960328] devm_thermal_of_zone_register+0x6c/0xc0
[ 1.965886] rockchip_thermal_probe+0x238/0x5e8
[ 1.970951] platform_probe+0x70/0xe8
[ 1.975049] really_probe+0xc4/0x278
[ 1.979039] __driver_probe_device+0x80/0x140
[ 1.983911] driver_probe_device+0x48/0x130
[ 1.988589] __driver_attach+0x7c/0x138
[ 1.992880] bus_for_each_dev+0x80/0xf0
[ 1.997176] driver_attach+0x2c/0x40
[ 2.001173] bus_add_driver+0xec/0x1f8
[ 2.005372] driver_register+0x68/0x138
[ 2.009663] __platform_driver_register+0x30/0x48
[ 2.014927] rockchip_thermal_driver_init+0x24/0x38
[ 2.020383] do_one_initcall+0x50/0x2d8
[ 2.024674] kernel_init_freeable+0x204/0x440
[ 2.029549] kernel_init+0x28/0x140
[ 2.033456] ret_from_fork+0x10/0x20
------------------------------------------
IMO we should check something like:
----------------------8<--------------------------
if (tz->passive)
tz->passive--;
--------------------->8---------------------------
because we start from '0' as init value.
> } else {
> td->threshold -= trip->hysteresis;
> }
> @@ -402,8 +405,10 @@ static void handle_thermal_trip(struct t
> td->notify_temp = trip->temperature;
> td->threshold -= trip->hysteresis;
>
> - if (trip->type == THERMAL_TRIP_CRITICAL ||
> - trip->type == THERMAL_TRIP_HOT)
> + if (trip->type == THERMAL_TRIP_PASSIVE)
> + tz->passive++;
Also, we have to make sure here, that we are align after some change
and be able to return with value to 0 (to switch to 'passive'
slow checks).
Regards,
Lukasz
Powered by blists - more mailing lists