lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4b38b84a-7427-45b1-b8e6-c68c5dbce676@notapiano>
Date:   Mon, 25 Sep 2023 12:20:40 -0400
From:   Nícolas F. R. A. Prado 
        <nfraprado@...labora.com>
To:     Daniel Lezcano <daniel.lezcano@...aro.org>
Cc:     "Rafael J . Wysocki" <rafael@...nel.org>, kernel@...labora.com,
        AngeloGioacchino Del Regno 
        <angelogioacchino.delregno@...labora.com>,
        Amit Kucheria <amitk@...nel.org>,
        Zhang Rui <rui.zhang@...el.com>, linux-kernel@...r.kernel.org,
        linux-pm@...r.kernel.org
Subject: Re: [PATCH] thermal: core: Check correct temperature for thermal
 trip notification

On Mon, Sep 25, 2023 at 12:19:18PM +0200, Daniel Lezcano wrote:
> 
> Hi Nicolas,
> 
> On 22/09/2023 21:27, Nícolas F. R. A. Prado wrote:
> > The thermal trip down notification should be triggered when the
> > temperature goes below the trip temperature minus the hysteresis. But
> > while the temperature is correctly checked against that, the last
> > temperature is instead compared against the trip point temperature. The
> > end result is that the notification won't always be triggered, only when
> > the temperature happens to drop quick enough so that the last
> > temperature was still above the trip point temperature.
> > 
> > Fix the incorrect check.
> > 
> > Fixes: 55cdf0a283b8 ("thermal: core: Add notifications call in the framework")
> > Signed-off-by: Nícolas F. R. A. Prado <nfraprado@...labora.com>
> > 
> > ---
> > 
> >   drivers/thermal/thermal_core.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> > index 58533ea75cd9..120fcf23b8e5 100644
> > --- a/drivers/thermal/thermal_core.c
> > +++ b/drivers/thermal/thermal_core.c
> > @@ -361,7 +361,7 @@ static void handle_thermal_trip(struct thermal_zone_device *tz, int trip_id)
> >   		    tz->temperature >= trip.temperature)
> >   			thermal_notify_tz_trip_up(tz->id, trip_id,
> >   						  tz->temperature);
> > -		if (tz->last_temperature >= trip.temperature &&
> > +		if (tz->last_temperature >= (trip.temperature - trip.hysteresis) &&
> >   		    tz->temperature < (trip.temperature - trip.hysteresis))
> >   			thermal_notify_tz_trip_down(tz->id, trip_id,
> >   						    tz->temperature);
> 
> We already did a try to fix the thermal trip cross notification but this is
> not sufficient for a full fix.
> 
> We are receiving multiple notifications from the same event, all this due to
> the hysteresis.
> 
> Let's say, we have a trip point T and a hysteresis H.
> 
> There is a trip point crossed the way up when:
> 
> 	last_temperature < T and temperature >= T
> 
> At this point, we send a notification
> 
> Now, the temperature decreases but it stays in the hysteresis:
> 
> 	last_temperature >= T and temperature > (T - H)
> 
> And then, the temperature increases again and is greater than T.
> 
> 	last_temperature > (T - H) and temperature >= T
> 
> We send a second notification.

Right, I've observed this issue in the logic that updates the trip points, and
just sent the v2 fix for it:

https://lore.kernel.org/all/20230922184425.290894-1-nfraprado@collabora.com

I wasn't aware of the cleanups you pointed to below, but at least in their
current form it doesn't seem they would take care of fixing the trip point
update logic like I did in that patch. So please do take a look.

It was while testing that patch that I stumbled upon this code for the
notification and just quickly fixed it. But indeed this patch is not a full fix
as the one you pointed to, so let's wait for that one.

Thanks,
Nícolas

> 
> With the mitigation kicking in at temp >= T, we end up with multiple events
> 'temperature crossed the trip point the way up"'. That forces the receiver
> of the events to detect duplicate events in order to ignore them.
> 
> More info:
> 
> 	https://lore.kernel.org/all/20220718145038.1114379-4-daniel.lezcano@linaro.org/
> 	
> We have done a lot of cleanups to use the 'generic trip points' and remove
> those get_trip_* ops. So the trip point structure is a core component being
> reused by the drivers and registered as an array.
> 
> Next step is to make sure the trip points are ordered in the array, so we
> can implement the correct trip point crossing detection.
> 
> 
> 
> 
> -- 
> <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
> 
> Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
> <http://twitter.com/#!/linaroorg> Twitter |
> <http://www.linaro.org/linaro-blog/> Blog
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ