[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <184d5217-149f-4f6a-b4a6-f897a8391873@roeck-us.net>
Date: Tue, 4 Mar 2025 07:04:04 -0800
From: Guenter Roeck <linux@...ck-us.net>
To: maudspierings@...ontroll.com, Jean Delvare <jdelvare@...e.com>
Cc: linux-hwmon@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] hwmon: (ntc_thermistor) return error instead of clipping
on OOB
On 3/4/25 06:10, Maud Spierings via B4 Relay wrote:
> From: Maud Spierings <maudspierings@...ontroll.com>
>
> When the ntc is reading Out Of Bounds instead of clipping to the nearest
> limit (min/max) return -ENODATA. This prevents malfunctioning sensors
> from sending a device into a shutdown loop due to a critical trip.
>
> This implementation will only work for ntc type thermistors if a ptc
> type is to be implemented the min/max ohm calculation must be adjusted
> to take that into account.
>
> Signed-off-by: Maud Spierings <maudspierings@...ontroll.com>
> ---
> This patch is a continuation of another discussion [1]. I felt like it
> should be a new patch, not a v2 as this is a very different change.
>
> I have left the clamping of n to INT_MAX in the code with a comment, but
> it may be possible to find a better solution to it. One I thought of is
> to make the ohm field of the ntc_compensation struct a signed int as
> well. It would get rid of this weird edge case, but it doesn't make
> sense to allow for negative resistances to be entered into the sensor
> table.
>
> Currently the only feedback this provides to the user is when they
> manually try to read the temperature and it returns the error. I have
> added a simple printk to these error points to see how spammy it gets
> and this is the result:
>
> dmesg | grep hwmon
> [ 4.982682] hwmon: sensor out of bounds
> [ 5.249758] hwmon: sensor out of bounds
> [ 5.633729] hwmon: sensor out of bounds
> [ 6.215285] hwmon: sensor out of bounds
> [ 7.073882] hwmon: sensor out of bounds
> [ 7.486620] hwmon: sensor out of bounds
> [ 8.833765] hwmon: sensor out of bounds
> [ 10.785969] hwmon: sensor out of bounds
> [ 13.793722] hwmon: sensor out of bounds
> [ 16.761124] hwmon: sensor out of bounds
> [ 17.889706] hwmon: sensor out of bounds
> [ 25.057715] hwmon: sensor out of bounds
> [ 35.041725] hwmon: sensor out of bounds
> [ 50.110346] hwmon: sensor out of bounds
> [ 72.945283] hwmon: sensor out of bounds
> [ 105.712619] hwmon: sensor out of bounds
> [ 154.863976] hwmon: sensor out of bounds
> [ 164.937104] hwmon: sensor out of bounds
> [ 228.590909] hwmon: sensor out of bounds
> [ 315.365777] hwmon: sensor out of bounds
> [ 464.718403] hwmon: sensor out of bounds
> [ 615.079123] hwmon: sensor out of bounds
> [ 764.496780] hwmon: sensor out of bounds
>
> This is with polling-delay set to 1000, it seems to rate-limit itself?
> But I feel there should be a better way to communicate the potential
> sensor failure to the user, but I can't think of anything.
>
Hmm ... we could add "fault" attributes and report a sensor fault
if uv == 0 or uv >= puv, together with the -ENODATA temperature reading.
That could distinguish the fault from the "resistor value out of range" case.
> [1]: https://lore.kernel.org/all/20250304-ntc_min_max-v1-1-b08e70e56459@gocontroll.com/
Most of the above should be after "---" since it is irrelevant for the commit log.
> ---
> drivers/hwmon/ntc_thermistor.c | 19 +++++++++++++------
> 1 file changed, 13 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/hwmon/ntc_thermistor.c b/drivers/hwmon/ntc_thermistor.c
> index 0d29c8f97ba7c2f264588b6309b91ca494012ad6..311a60498d409ea068a18590415b2d5b43e73eb1 100644
> --- a/drivers/hwmon/ntc_thermistor.c
> +++ b/drivers/hwmon/ntc_thermistor.c
> @@ -387,12 +387,9 @@ static int get_ohm_of_thermistor(struct ntc_data *data, unsigned int uv)
> puo = data->pullup_ohm;
> pdo = data->pulldown_ohm;
>
> - if (uv == 0)
> - return (data->connect == NTC_CONNECTED_POSITIVE) ?
> - INT_MAX : 0;
> - if (uv >= puv)
> - return (data->connect == NTC_CONNECTED_POSITIVE) ?
> - 0 : INT_MAX;
> + /* faulty adc value */
> + if (uv == 0 || uv >= puv)
> + return -ENODATA;
>
> if (data->connect == NTC_CONNECTED_POSITIVE && puo == 0)
> n = div_u64(pdo * (puv - uv), uv);
> @@ -404,6 +401,16 @@ static int get_ohm_of_thermistor(struct ntc_data *data, unsigned int uv)
> else
> n = div64_u64_safe(pdo * puo * uv, pdo * (puv - uv) - puo * uv);
>
> + /* sensor out of bounds */
> + if (n > data->comp[0].ohm || n < data->comp[data->n_comp-1].ohm)
> + return -ENODATA;
> +
> + /*
> + * theoretically data->comp[0].ohm can be greater than INT_MAX as it is an
> + * unsigned integer, but it doesn't make any sense for it to be so as the
> + * maximum return value of this function is INT_MAX, so it will never be
> + * able to properly calculate that temperature.
> + */
> if (n > INT_MAX)
> n = INT_MAX;
I am not a friend of theoretic code or comments like this. Please drop.
The original code was intended to catch out-of-bounds values which would
otherwise have been reported as error, not to catch unrealistic resistor
values in the compensation tables.
Thanks,
Guenter
Powered by blists - more mailing lists