[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <c0333a78-a834-6eed-5c80-7fcd31aa666d@linaro.org>
Date: Tue, 21 Feb 2023 17:08:00 +0100
From: Daniel Lezcano <daniel.lezcano@...aro.org>
To: Guenter Roeck <linux@...ck-us.net>
Cc: linux-pm@...r.kernel.org, linux-kernel@...r.kernel.org,
Amit Kucheria <amitk@...nel.org>,
Zhang Rui <rui.zhang@...el.com>, rafael@...nel.org
Subject: Re: [PATCH v1 07/17] thermal/hwmon: Use the thermal API instead
tampering the internals
On 20/02/2023 18:12, Guenter Roeck wrote:
> On Mon, Feb 20, 2023 at 04:39:48PM +0100, Daniel Lezcano wrote:
>> On 20/02/2023 15:11, Guenter Roeck wrote:
>>> On Mon, Feb 20, 2023 at 02:34:08PM +0100, Daniel Lezcano wrote:
>>>> Hi Guenter,
>>>>
>>>> my script should have Cc'ed you but it didn't, so just a heads up this patch
>>>> ;)
>>>>
>>>> On 19/02/2023 15:36, Daniel Lezcano wrote:
>>>>> In this function, there is a guarantee the thermal zone is registered.
>>>>>
>>>>> The sysfs hwmon unregistering will be blocked until we exit the
>>>>> function. The thermal zone is unregistered after the sysfs hwmon is
>>>>> unregistered.
>>>>>
>>>>> When we are in this function, the thermal zone is registered.
>>>>>
>>>>> We can call the thermal_zone_get_crit_temp() function safely and let
>>>>> the function use the lock which is private the thermal core code.
>>>>>
>>>
>>> Hmm, if you say so. That very same call used to cause a crash in
>>> Chromebooks, which is why I had added the locking.
>>
>> Mmh, I see. I guess we can assume thermal_hwmon is part of the core code and
>> remove this change.
>>
>
> Yes. Anyway, the sequence of events was roughly as follows.
>
> - thermal zone is device is registered
> - hwmon device is registered
> - userspace is triggered and starts reading device attributes
> - while userspace has a hwmon attribute open, thermal device is unregistered
> - hwmon device is unregistered (sysfs attribute is still open)
> - hwmon device attribute function is called
> - Since thermal device ops have been released after the thermal device
> was unregistered, trying to call an ops callback fails.
>
> That doesn't normally happen, but the Intel wireless driver has the habit
> of registering a thermal zone early in its probe function, only to unregister
> it immediately afterwards if the probe function fails. If some userspace
> activity is triggered by the hwmon device registration, the thermal and
> hwmon device removal may be timed such that the hwmon devive is removed
> while one (or more) of its attribute files are still open. Normally that
> doesn't matter, but it is fatal here since the ops callbacks are not owned
> by the hwmon device but by the thermal device.
>
> Essentially every ops callback has this problem.
> thermal_zone_get_temp() had it as well, also associated with
> a hwmon sysfs attribute read operation. See commit 1c6b30060777
> ("thermal/core: Ensure that thermal device is registered in
> thermal_zone_get_temp").
>
> If you don't want non-thermal code to access ->ops directly, the thermal
> code would have to provide protected accessor functions, similar to
> thermal_zone_get_temp().
Hopefully we are getting rid of most of the ops soon ... :/
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
Powered by blists - more mailing lists