lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 8 Feb 2023 17:35:22 +0200
From:   Mikko Perttunen <cyndis@...si.fi>
To:     Thierry Reding <thierry.reding@...il.com>
Cc:     "Rafael J. Wysocki" <rafael@...nel.org>,
        Daniel Lezcano <daniel.lezcano@...aro.org>,
        Amit Kucheria <amitk@...nel.org>,
        Zhang Rui <rui.zhang@...el.com>,
        Jonathan Hunter <jonathanh@...dia.com>,
        Srikar Srimath Tirumala <srikars@...dia.com>,
        Mikko Perttunen <mperttunen@...dia.com>,
        Timo Alho <talho@...dia.com>, linux-pm@...r.kernel.org,
        linux-tegra@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/2] thermal: tegra-bpmp: Always (re)program trip
 temperatures

On 2/8/23 12:43, Thierry Reding wrote:
> On Tue, Feb 07, 2023 at 03:56:09PM +0200, Mikko Perttunen wrote:
>> From: Mikko Perttunen <mperttunen@...dia.com>
>>
>> In the rare case that calculation of trip temperatures would result
>> in the same trip temperatures that were previously programmed, the
>> thermal core skips calling .set_trips.
> 
> That seems like an appropriate optimization.
> 
>> However, presently, if it is not called, we may end up with no trip
>> temperatures programmed at all.
> 
> I have a hard time understanding when this would happen. prev_low_trip
> and prev_high_trip are -INT_MAX and INT_MAX, respectively, so these are
> unlikely to be the result of anything we compute at runtime, based on
> temperatures specified in DT, for example.

Consider:

Temperature is 45C.
set_trips is called with low=40C high=50C. We program accordingly.
Temperature goes to 55C. Trip point triggers.
Before execution gets to CPU, temperature returns to 45C.
CPU gets the MRQ, calls into thermal core to update.
Thermal core notices that temperature is 45C and sets again the same 
low=40C high=50C trip points, does not call set_trips.
No trip point is programmed to BPMP and we never get trips again.

The above, of course, is rather unlikely to happen, but theoretically 
possible nevertheless.

Alternatively, where I discovered the issue originally, was the issue 
described in the last paragraph of the commit message; see below.

> 
> So I would expect ->set_trips() to get called at least once when the
> thermal zones are first registered. Are you saying there are cases where
> ->set_trips() doesn't get called at all?

No, not saying that. It will get called when registering the zone 
initially, but see below.

> 
>> To avoid this, make set_trips a no-op and in places where it would be
>> called, instead unconditionally program trip temperatures to the last
>> specified temperatures.
> 
> Again, this seems more like a workaround for an issue that exists
> elsewhere. If ->set_trips() doesn't always get called when it should be,
> then that's what we should fix.

I think it depends on what the interpretation is with set_trips. If the 
interpretation is that the the trips configured in the hardware are 
persistent (not disabled when a trip occurs), then the current 
implementation and this patch make sense. Otherwise a change in the 
thermal core would make sense.

> 
>> This also fixes the situation where a trip is triggered between
>> registering a thermal zone and registering the trip MRQ handler, in
>> which case we would also get stuck.
> 
> Could this be fixed by requesting the MRQ prior to registering the
> zones? That seems like the more appropriate fix for this issue. It's
> similar to how we typically register IRQ handlers before enabling a
> device to make sure we don't miss any interrupts.

I considered that -- there are two reasons I didn't go for it:

1. It doesn't solve the race condition described in the first part of 
the message
2. To handle the incoming MRQ, zone->tzd needs to be set. But we only 
get tzd from the zone registration call, and already before that call 
returns, set_trips has been called and we might have received an MRQ. I 
tested using a completion object to block in the MRQ handler until the 
initialization completes, but that's pretty ugly as well.

> 
> Thierry

Thanks,
Mikko

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ