lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 30 Nov 2015 21:50:11 -0800
From:	Guenter Roeck <linux@...ck-us.net>
To:	Nishanth Menon <nm@...com>, Jean Delvare <jdelvare@...e.com>
Cc:	linux-kernel@...r.kernel.org, lm-sensors@...sensors.org,
	linux-omap@...r.kernel.org, beagleboard-x15@...glegroups.com,
	Eduardo Valentin <edubezval@...il.com>
Subject: Re: [PATCH] hwmon: (tmp102) Force wait for conversion time for the
 first valid data

On 11/30/2015 08:25 PM, Nishanth Menon wrote:
> TMP102 works based on conversions done periodically. However, as per
> the TMP102 data sheet[1] the first conversion is triggered immediately
> after we program the configuration register. The temperature data
> registers do not reflect proper data until the first conversion is
> complete (in our case HZ/4).
>
> The driver currently sets the last_update to be jiffies - HZ, just
> after the configuration is complete. When tmp102 driver registers
> with the thermal framework, it immediately tries to read the sensor
> temperature data. This takes place even before the conversion on the
> TMP102 is complete and results in an invalid temperature read.
>
> Depending on the value read, this may cause thermal framework to
> assume that a critical temperature event has occurred and attempts to
> shutdown the system.
>
> Instead of causing an invalid mid-conversion value to be read
> erroneously, we mark the last_update to be in-line with the current
> jiffies. This allows the tmp102_update_device function to skip update
> until the required conversion time is complete. Further, we ensure to
> return -EAGAIN result instead of returning spurious temperature (such
> as 0C) values to the caller to prevent any wrong decisions made with
> such values.
>
> A simpler alternative approach could be to sleep in the probe for the
> duration required, but that will result in latency that is undesirable
> that can delay boot sequence un-necessarily.
>
A really simpler solution would be to mark when the device is ready
to be accessed in the probe function, and go to sleep for the remaining time
in the update function if necessary. This would not affect the probe function,
avoid the somewhat awkward -EAGAIN, avoid overloading the value cache, and only
sleep if necessary and as long as needed.

> [1] http://www.ti.com/lit/ds/symlink/tmp102.pdf
>
> Cc: Eduardo Valentin <edubezval@...il.com>
> Reported-by: Aparna Balasubramanian <aparnab@...com>
> Reported-by: Elvita Lobo <elvita@...com>
> Reported-by: Yan Liu <yan-liu@...com>
> Signed-off-by: Nishanth Menon <nm@...com>
> ---
>
> Example case (from Beagleboard-x15 using an older kernel revision):
> 	http://pastebin.ubuntu.com/13591711/
> Notice the thermal shutdown trigger:
> 	thermal thermal_zone3: critical temperature reached(108 C),shutting down
>
>   drivers/hwmon/tmp102.c | 19 ++++++++++++++++++-
>   1 file changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/hwmon/tmp102.c b/drivers/hwmon/tmp102.c
> index 65482624ea2c..145f69108f23 100644
> --- a/drivers/hwmon/tmp102.c
> +++ b/drivers/hwmon/tmp102.c
> @@ -50,6 +50,9 @@
>   #define	TMP102_TLOW_REG			0x02
>   #define	TMP102_THIGH_REG		0x03
>
> +/* TMP102 range is -55 to 150C -> we use -128 as a default invalid value */
> +#define TMP102_NOTREADY			-128
> +

This is a bit misleading, and also not correct, since the temperature is stored in
milli-degrees C, so a value of -128 reflects -0.128 degreees C. While that value
will not be seen in practice, it is still not a good idea to use it for this purpose.

Even though the chip temperature range is -55 .. 150 C, that doesn't mean
it never returns a value outside that range, for example if nothing is connected
to an external sensor or if something is broken.

You should use a value outside the value range, ie outside
[-128,000 .. 127,999 ] to detect the "not ready" condition.

>   struct tmp102 {
>   	struct i2c_client *client;
>   	struct device *hwmon_dev;
> @@ -102,6 +105,12 @@ static int tmp102_read_temp(void *dev, int *temp)
>   {
>   	struct tmp102 *tmp102 = tmp102_update_device(dev);
>
> +	/* Is it too early even to return a conversion? */
> +	if (tmp102->temp[0] == TMP102_NOTREADY) {
> +		dev_dbg(dev, "%s: Conversion not ready yet..\n", __func__);
> +		return -EAGAIN;

Does this cause a hard loop in the calling code, or will the thermal code
delay before it reads again ?

If it causes a hard loop, it may be better to go to sleep if needed
when reading the data, as suggested above.

> +	}
> +
>   	*temp = tmp102->temp[0];
>
>   	return 0;
> @@ -114,6 +123,10 @@ static ssize_t tmp102_show_temp(struct device *dev,
>   	struct sensor_device_attribute *sda = to_sensor_dev_attr(attr);
>   	struct tmp102 *tmp102 = tmp102_update_device(dev);
>
> +	/* Is it too early even to return a read? */
> +	if (tmp102->temp[sda->index] == TMP102_NOTREADY)
> +		return -EAGAIN;
> +
>   	return sprintf(buf, "%d\n", tmp102->temp[sda->index]);
>   }
>
> @@ -207,7 +220,11 @@ static int tmp102_probe(struct i2c_client *client,
>   		status = -ENODEV;
>   		goto fail_restore_config;
>   	}
> -	tmp102->last_update = jiffies - HZ;
> +	tmp102->last_update = jiffies;
> +	/* Mark that we are not ready with data until conversion is complete */
> +	tmp102->temp[0] = TMP102_NOTREADY;
> +	tmp102->temp[1] = TMP102_NOTREADY;
> +	tmp102->temp[2] = TMP102_NOTREADY;
>   	mutex_init(&tmp102->lock);
>
>   	hwmon_dev = hwmon_device_register_with_groups(dev, client->name,
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ