[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bdb8b51f-93ac-9f99-914e-e1ce16c0076d@roeck-us.net>
Date: Mon, 21 Feb 2022 08:02:15 -0800
From: Guenter Roeck <linux@...ck-us.net>
To: Jon Hunter <jonathanh@...dia.com>,
Dmitry Osipenko <digetx@...il.com>,
Jean Delvare <jdelvare@...e.com>
Cc: linux-kernel@...r.kernel.org, linux-hwmon@...r.kernel.org,
linux-tegra@...r.kernel.org
Subject: Re: [PATCH v3 2/4] hwmon: (lm90) Use hwmon_notify_event()
On 2/21/22 07:49, Jon Hunter wrote:
>
> On 21/02/2022 15:43, Guenter Roeck wrote:
>
> ...
>
>>> We observed a random null pointer deference crash somewhere in the
>>> thermal core (crash log below is not very helpful) when calling
>>> mutex_lock(). It looks like we get an interrupt when this crash
>>> happens.
>>>
>>> Looking at the lm90 driver, per the above, I now see we are calling
>>> hwmon_notify_event() from the lm90 interrupt handler. Looking at
>>> hwmon_notify_event() I see that ...
>>>
>>> hwmon_notify_event()
>>> --> hwmon_thermal_notify()
>>> --> thermal_zone_device_update()
>>> --> update_temperature()
>>> --> mutex_lock()
>>>
>>> So although I don't completely understand the crash, it does seem
>>> that we should not be calling hwmon_notify_event() from the
>>> interrupt handler.
>>>
>> As mentioned separately, this is not the problem.
>
> Yes I can see that now.
>
>> I think the problem may be that this is not a devicetree system
>> (or the lm90 devide does not have a devicetree node), but thermal
>> notification currently only works in such systems because the hwmon
>> subsystem uses the devicetree registration method. At the same time,
>> CONFIG_THERMAL_OF is obviously enabled. Unfortunately, the hwmon code
>> does not bail out in that situation due to another bug.
>
> The platform I see this on does use device-tree and it does have a node for the ti,tmp451 device which uses the lm90 device. This platform uses the device-tree source arch/arm64/boot/dts/nvidia/tegra194-p2972-0000.dts and the tmp451 node is in arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi.
>
Interesting. It appears that the call to devm_thermal_zone_of_sensor_register()
in the hwmon core nevertheless returns -ENODEV which is not handled properly
in the hwmon core. I can see a number of reasons for this to happen:
- there is no devicetree node for the lm90 device
- there is no thermal-zones devicetree node
- there is no thermal zone entry in the thermal-zones node which matches
the sensor
We'll have to revert the lm90 changes until this is sorted out.
Guenter
Powered by blists - more mailing lists