[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4b38b84a-7427-45b1-b8e6-c68c5dbce676@notapiano>
Date: Mon, 25 Sep 2023 12:20:40 -0400
From: Nícolas F. R. A. Prado
<nfraprado@...labora.com>
To: Daniel Lezcano <daniel.lezcano@...aro.org>
Cc: "Rafael J . Wysocki" <rafael@...nel.org>, kernel@...labora.com,
AngeloGioacchino Del Regno
<angelogioacchino.delregno@...labora.com>,
Amit Kucheria <amitk@...nel.org>,
Zhang Rui <rui.zhang@...el.com>, linux-kernel@...r.kernel.org,
linux-pm@...r.kernel.org
Subject: Re: [PATCH] thermal: core: Check correct temperature for thermal
trip notification
On Mon, Sep 25, 2023 at 12:19:18PM +0200, Daniel Lezcano wrote:
>
> Hi Nicolas,
>
> On 22/09/2023 21:27, Nícolas F. R. A. Prado wrote:
> > The thermal trip down notification should be triggered when the
> > temperature goes below the trip temperature minus the hysteresis. But
> > while the temperature is correctly checked against that, the last
> > temperature is instead compared against the trip point temperature. The
> > end result is that the notification won't always be triggered, only when
> > the temperature happens to drop quick enough so that the last
> > temperature was still above the trip point temperature.
> >
> > Fix the incorrect check.
> >
> > Fixes: 55cdf0a283b8 ("thermal: core: Add notifications call in the framework")
> > Signed-off-by: Nícolas F. R. A. Prado <nfraprado@...labora.com>
> >
> > ---
> >
> > drivers/thermal/thermal_core.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> > index 58533ea75cd9..120fcf23b8e5 100644
> > --- a/drivers/thermal/thermal_core.c
> > +++ b/drivers/thermal/thermal_core.c
> > @@ -361,7 +361,7 @@ static void handle_thermal_trip(struct thermal_zone_device *tz, int trip_id)
> > tz->temperature >= trip.temperature)
> > thermal_notify_tz_trip_up(tz->id, trip_id,
> > tz->temperature);
> > - if (tz->last_temperature >= trip.temperature &&
> > + if (tz->last_temperature >= (trip.temperature - trip.hysteresis) &&
> > tz->temperature < (trip.temperature - trip.hysteresis))
> > thermal_notify_tz_trip_down(tz->id, trip_id,
> > tz->temperature);
>
> We already did a try to fix the thermal trip cross notification but this is
> not sufficient for a full fix.
>
> We are receiving multiple notifications from the same event, all this due to
> the hysteresis.
>
> Let's say, we have a trip point T and a hysteresis H.
>
> There is a trip point crossed the way up when:
>
> last_temperature < T and temperature >= T
>
> At this point, we send a notification
>
> Now, the temperature decreases but it stays in the hysteresis:
>
> last_temperature >= T and temperature > (T - H)
>
> And then, the temperature increases again and is greater than T.
>
> last_temperature > (T - H) and temperature >= T
>
> We send a second notification.
Right, I've observed this issue in the logic that updates the trip points, and
just sent the v2 fix for it:
https://lore.kernel.org/all/20230922184425.290894-1-nfraprado@collabora.com
I wasn't aware of the cleanups you pointed to below, but at least in their
current form it doesn't seem they would take care of fixing the trip point
update logic like I did in that patch. So please do take a look.
It was while testing that patch that I stumbled upon this code for the
notification and just quickly fixed it. But indeed this patch is not a full fix
as the one you pointed to, so let's wait for that one.
Thanks,
Nícolas
>
> With the mitigation kicking in at temp >= T, we end up with multiple events
> 'temperature crossed the trip point the way up"'. That forces the receiver
> of the events to detect duplicate events in order to ignore them.
>
> More info:
>
> https://lore.kernel.org/all/20220718145038.1114379-4-daniel.lezcano@linaro.org/
>
> We have done a lot of cleanups to use the 'generic trip points' and remove
> those get_trip_* ops. So the trip point structure is a core component being
> reused by the drivers and registered as an array.
>
> Next step is to make sure the trip points are ordered in the array, so we
> can implement the correct trip point crossing detection.
>
>
>
>
> --
> <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
>
> Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
> <http://twitter.com/#!/linaroorg> Twitter |
> <http://www.linaro.org/linaro-blog/> Blog
>
Powered by blists - more mailing lists