[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAJZ5v0jkiAkmFU8c+0tfD3R+wZBieOmsqezCDTE91UT_hzW48w@mail.gmail.com>
Date: Tue, 30 Apr 2024 17:08:13 +0200
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Lukasz Luba <lukasz.luba@....com>
Cc: "Rafael J. Wysocki" <rafael@...nel.org>, "Rafael J. Wysocki" <rjw@...ysocki.net>,
Daniel Lezcano <daniel.lezcano@...aro.org>, Linux PM <linux-pm@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v1] thermal: core: Move passive polling management to the core
On Tue, Apr 30, 2024 at 4:58 PM Lukasz Luba <lukasz.luba@....com> wrote:
>
> On 4/30/24 14:48, Rafael J. Wysocki wrote:
> > Hi Lukasz,
> >
> > On Mon, Apr 29, 2024 at 11:21 PM Lukasz Luba <lukasz.luba@....com> wrote:
> >>
> >> Hi Rafael,
> >>
> >> On 4/25/24 15:11, Rafael J. Wysocki wrote:
> >>> From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> >>>
> >>> Passive polling is enabled by setting the 'passive' field in
> >>> struct thermal_zone_device to a positive value so long as the
> >>> 'passive_delay_jiffies' field is greater than zero. It causes
> >>> the thermal core to actively check the thermal zone temperature
> >>> periodically which in theory should be done after crossing a
> >>> passive trip point on the way up in order to allow governors to
> >>> react more rapidly to temperature changes and adjust mitigation
> >>> more precisely.
> >>>
> >>> However, the 'passive' field in struct thermal_zone_device is currently
> >>> managed by governors which is quite problematic. First of all, only
> >>> two governors, Step-Wise and Power Allocator, update that field at
> >>> all, so the other governors do not benefit from passive polling,
> >>> although in principle they should. Moreover, if the zone governor is
> >>> changed from, say, Step-Wise to Fair-Share after 'passive' has been
> >>> incremented by the former, it is not going to be reset back to zero by
> >>> the latter even if the zone temperature falls down below all passive
> >>> trip points.
> >>>
> >>> For this reason, make handle_thermal_trip() increment 'passive'
> >>> to enable passive polling for the given thermal zone whenever a
> >>> passive trip point is crossed on the way up and decrement it
> >>> whenever a passive trip point is crossed on the way down. Also
> >>> remove the 'passive' field updates from governors and additionally
> >>> clear it in thermal_zone_device_init() to prevent passive polling
> >>> from being enabled after a system resume just beacuse it was enabled
> >>> before suspending the system.
> >>>
> >>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> >>> ---
> >>>
> >>> This has been mentioned here:
> >>>
> >>> https://lore.kernel.org/linux-pm/61560bc6-d453-4b0c-a4ea-b375d547b143@linaro.org/
> >>>
> >>> and I need someone to double check if the Power Allocator governor does not
> >>> need to be adjusted more for this change.
> >>>
> >>> ---
> >>> drivers/thermal/gov_power_allocator.c | 12 +++++++-----
> >>> drivers/thermal/gov_step_wise.c | 10 ----------
> >>> drivers/thermal/thermal_core.c | 10 ++++++++--
> >>> 3 files changed, 15 insertions(+), 17 deletions(-)
> >>>
> >>> Index: linux-pm/drivers/thermal/thermal_core.c
> >>> ===================================================================
> >>> --- linux-pm.orig/drivers/thermal/thermal_core.c
> >>> +++ linux-pm/drivers/thermal/thermal_core.c
> >>> @@ -389,6 +389,9 @@ static void handle_thermal_trip(struct t
> >>> if (tz->temperature < trip->temperature - trip->hysteresis) {
> >>> list_add(&td->notify_list_node, way_down_list);
> >>> td->notify_temp = trip->temperature - trip->hysteresis;
> >>> +
> >>> + if (trip->type == THERMAL_TRIP_PASSIVE)
> >>> + tz->passive--;
> >>
> >> This gets negative values and than the core switches to fast 'polling'
> >> mode. The values is decremented from 0 each time the
> >> thermal_zone_device_enable() is called.
> >
> > Interesting.
> >
> > This shouldn't happen because it means that the passive trip has been
> > crossed on the way down, but it wasn't crossed on the way up.
> >
> > It looks like an initialization issue to me.
> >
> >> Then IPA is actually called every 100ms after boot w/ low temp,
> >> while it should 1s.
> >>
> >> Please see the logs below:
> >> 'short log' after boot
> >> ----------------------------------------------
> >>
> >> [ 1.632670] thermal_sys: TZ: tz_id=0 passive-- = -1
> >> [ 1.637984] thermal_sys: TZ: tz_id=0 passive-- = -2
> >> [ 1.643641] thermal_sys: TZ: tz_id=1 passive-- = -1
> >> ----------------------------------------------
> >>
> >> long log with call stack dumped
> >> ----------------------------------------------
> >>
> >> [ 1.632973] thermal_sys: TZ: tz_id=0 passive-- = -1
> >> [ 1.638295] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 6.9.0-rc5+ #28
> >> [ 1.645409] Hardware name: Radxa ROCK 4SE (DT)
> >> [ 1.650376] Call trace:
> >> [ 1.653109] dump_backtrace+0x9c/0x100
> >> [ 1.657309] show_stack+0x20/0x38
> >> [ 1.661017] dump_stack_lvl+0xc0/0xd0
> >> [ 1.665119] dump_stack+0x18/0x28
> >> [ 1.668828] __thermal_zone_device_update+0x1fc/0x550
> >> [ 1.674484] thermal_zone_device_set_mode+0x64/0xc0
> >> [ 1.679943] thermal_zone_device_enable+0x1c/0x30
> >> [ 1.685206] thermal_of_zone_register+0x34c/0x590
> >
> > So let's see.
> >
> > thermal_of_zone_register() calls
> > thermal_zone_device_register_with_trips() which calls
> > thermal_zone_device_update() for the first time, but
> > __thermal_zone_device_update() sees that
> > thermal_zone_device_is_enabled() returns false, so it does nothing.
> >
> > This is right after thermal_zone_device_init() has been called, so
> > tz->temperature == THERMAL_TEMP_INVALID and tz->passive == 0.
> >
> > Next, thermal_zone_device_enable() is called by
> > thermal_of_zone_register() and it calls __thermal_zone_device_update()
> > via thermal_zone_device_set_mode().
> >
> > This time thermal_zone_device_is_enabled() returns true, so
> > update_temperature() is called and, unless __thermal_zone_get_temp()
> > returns an error, it sets tz->last_temperature to THERMAL_TEMP_INVALID
> > and tz->temperature to the current zone temperature.
> >
> > Next, handle_thermal_trip() is called for all trips and it sees that
> > tz->last_temperature == THERMAL_TEMP_INVALID, so it skips the branch
> > in which tz->passive is decremented.
> >
> > The only case I can see in which something else can happen in when
> > __thermal_zone_get_temp() called by update_temperature() returns an
> > error code (and if it is -EAGAIN, it won't even trigger a warning
> > message) in which case the error is silently discarded and
> > __thermal_zone_device_update() happily proceeds with tz->temperature
> > == THERMAL_TEMP_INVALID and tz->last_temperature == 0.
>
> That correct.
>
> >
> > This can lead to many surprises down the road, so IMV
> > __thermal_zone_device_update() should abort if it sees tz->temperature
> > == THERMAL_TEMP_INVALID past calling update_temperature().
>
> agree
>
> >
> > So I'm wondering if the patch below (modulo white-space damage from
> > GMail) helps.
> >
> >> [ 1.690473] devm_thermal_of_zone_register+0x6c/0xc0
> >> [ 1.696031] rockchip_thermal_probe+0x238/0x5e8
> >> [ 1.701106] platform_probe+0x70/0xe8
> >> [ 1.705208] really_probe+0xc4/0x278
> >> [ 1.709205] __driver_probe_device+0x80/0x140
> >> [ 1.714078] driver_probe_device+0x48/0x130
> >> [ 1.718756] __driver_attach+0x7c/0x138
> >> [ 1.723045] bus_for_each_dev+0x80/0xf0
> >> [ 1.727342] driver_attach+0x2c/0x40
> >> [ 1.731340] bus_add_driver+0xec/0x1f8
> >> [ 1.735539] driver_register+0x68/0x138
> >> [ 1.739828] __platform_driver_register+0x30/0x48
> >> [ 1.745093] rockchip_thermal_driver_init+0x24/0x38
> >> [ 1.750551] do_one_initcall+0x50/0x2d8
> >> [ 1.754844] kernel_init_freeable+0x204/0x440
> >> [ 1.759722] kernel_init+0x28/0x140
> >> [ 1.763631] ret_from_fork+0x10/0x20
> >> [ 1.767802] thermal_sys: TZ: tz_id=0 passive-- = -2
> >
> > ---
> > drivers/thermal/thermal_core.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > Index: linux-pm/drivers/thermal/thermal_core.c
> > ===================================================================
> > --- linux-pm.orig/drivers/thermal/thermal_core.c
> > +++ linux-pm/drivers/thermal/thermal_core.c
> > @@ -496,6 +496,9 @@ void __thermal_zone_device_update(struct
> >
> > update_temperature(tz);
> >
> > + if (tz->temperature == THERMAL_TEMP_INVALID)
> > + return;
> > +
> > tz->notify_event = event;
> >
> > for_each_trip_desc(tz, td)
>
> Yes, it prevents that previous situation. I have added a debug print
> before your return in the code above and the it's in the log:
>
> [ 1.632520] thermal_sys: TZ: THERMAL_TEMP_INVALID, return
> [ 1.638899] thermal_sys: TZ: THERMAL_TEMP_INVALID, return
>
> The tracing also shows that we have only 1s slow polling.
> It also works properly in IPA when crossing 2 trip points
> and coming back to low temp.
>
> So, that code above should be OK. Thanks!
Thanks for the confirmation!
Let me put the fix patch and the $subject one in a series and resend
them together.
Powered by blists - more mailing lists