lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aLXtRvJBrPGNmRZ3@stanley.mountain>
Date: Mon, 1 Sep 2025 22:00:22 +0300
From: Dan Carpenter <dan.carpenter@...aro.org>
To: Mohammad Amin Hosseini <moahmmad.hosseinii@...il.com>
Cc: linux-iio@...r.kernel.org, linux-staging@...ts.linux.dev,
	linux-kernel@...r.kernel.org, gregkh@...uxfoundation.org,
	jic23@...nel.org, lars@...afoo.de, Michael.Hennerich@...log.com,
	dlechner@...libre.com, nuno.sa@...log.com, andy@...nel.org,
	sonic.zhang@...log.com, vapier@...too.org
Subject: Re: [PATCH v4] staging: iio: adc: ad7816: fix race condition in SPI
 operations

On Mon, Sep 01, 2025 at 07:33:10PM +0330, Mohammad Amin Hosseini wrote:
> The ad7816 driver lacks proper synchronization around SPI operations
> and device state access. Concurrent access from multiple threads can
> lead to data corruption and inconsistent device state.
> 
> The driver performs sequences of GPIO pin manipulations followed by
> SPI transactions without any locking. Device state variables (mode,
> channel_id, oti_data) are also accessed without synchronization.
> 
> This bug was found through manual code review using static analysis
> techniques. The review focused on identifying unsynchronized access
> patterns to shared resources. Key indicators were:
> - GPIO pin state changes followed by SPI operations without atomicity
> - Shared state variables accessed from multiple sysfs entry points
> - No mutex or spinlock protection around sections
> - Potential for interleaved execution in multi-threaded environments
> 
> The review methodology involved tracing data flow paths and identifying
> points where concurrent access could corrupt device state or SPI
> communication sequences.
> 
> Add io_lock mutex to protect:
> - SPI transactions and GPIO sequences in read/write functions
> - Device state variables in sysfs show/store functions
> - Concurrent access to chip configuration
> 
> This prevents race conditions when multiple processes access the device
> simultaneously through sysfs attributes or device file operations.
> 
> Fixes: 7924425db04a ("staging: iio: adc: new driver for AD7816 devices")
> 
> Signed-off-by: Mohammad Amin Hosseini <moahmmad.hosseinii@...il.com>
> 
> ---
> Changes in v4:
> - Added locking to reader functions (show_mode, show_channel, show_oti)
> - Fixed incomplete reader/writer synchronization that could still race
> - Ensured all device state access is properly synchronized
> - Replace sprintf() with sysfs_emit() in all sysfs show functions
> - Use sysfs_streq() instead of strcmp() for proper input parsing
> - Implement locked/unlocked SPI function variants to prevent deadlock
> - Use channel snapshot to ensure atomic read operations
> - Fix sizeof() usage in spi_read to be more explicit (sizeof(buf))
> - Make oti write operations atomic (SPI write + shadow update under lock)
> - Fix race condition in ad7816_set_oti() by taking channel_id snapshot under lock
> - Fix return type consistency (ssize_t vs int) in show functions
> - Use chip->id instead of string comparison for channel validation
> - Add explicit cast for narrowing assignment
> - Add default case for unknown chip ID validation
> - Use cansleep GPIO variants in sleepable context

Why?  This isn't described in the commit message.

(This is not a rhetorical question and the answer will change how we
approach this).

> - Improve lock documentation for protected resources
> ---
>  drivers/staging/iio/adc/ad7816.c | 210 ++++++++++++++++++++-----------
>  1 file changed, 138 insertions(+), 72 deletions(-)
> 
> diff --git a/drivers/staging/iio/adc/ad7816.c b/drivers/staging/iio/adc/ad7816.c
> index 4774df778de9..49a67e1b76f6 100644
> --- a/drivers/staging/iio/adc/ad7816.c
> +++ b/drivers/staging/iio/adc/ad7816.c
> @@ -50,6 +50,15 @@ struct ad7816_chip_info {
>  	u8  oti_data[AD7816_CS_MAX + 1];
>  	u8  channel_id;	/* 0 always be temperature */
>  	u8  mode;
> +  /*
> +   * Protects:
> +   *  - SPI transactions
> +   *  - GPIO toggling
> +   *  - channel_id
> +   *  - mode
> +   *  - oti_data
> +   */

Indenting is wrong.

> +	struct mutex io_lock;
>  };
>  
>  enum ad7816_type {
> @@ -59,60 +68,71 @@ enum ad7816_type {
>  };
>  
>  /*
> - * ad7816 data access by SPI
> + * ad7816 data access by SPI - locked versions assume io_lock is held
>   */
> -static int ad7816_spi_read(struct ad7816_chip_info *chip, u16 *data)
> +static int ad7816_spi_read_locked(struct ad7816_chip_info *chip, u8 channel, u16 *data)
>  {
>  	struct spi_device *spi_dev = chip->spi_dev;
>  	int ret;
>  	__be16 buf;
>  
> -	gpiod_set_value(chip->rdwr_pin, 1);
> -	gpiod_set_value(chip->rdwr_pin, 0);
> -	ret = spi_write(spi_dev, &chip->channel_id, sizeof(chip->channel_id));
> -	if (ret < 0) {
> -		dev_err(&spi_dev->dev, "SPI channel setting error\n");
> +	gpiod_set_value_cansleep(chip->rdwr_pin, 1);
> +	gpiod_set_value_cansleep(chip->rdwr_pin, 0);
> +	/* AD7816_CS_MASK: broadcast/all-channels per hw programming model */
> +	ret = spi_write(spi_dev, &channel, sizeof(channel));
> +	if (ret < 0)
>  		return ret;
> -	}
> -	gpiod_set_value(chip->rdwr_pin, 1);
> +	gpiod_set_value_cansleep(chip->rdwr_pin, 1);
>  
>  	if (chip->mode == AD7816_PD) { /* operating mode 2 */
> -		gpiod_set_value(chip->convert_pin, 1);
> -		gpiod_set_value(chip->convert_pin, 0);
> +		gpiod_set_value_cansleep(chip->convert_pin, 1);
> +		gpiod_set_value_cansleep(chip->convert_pin, 0);
>  	} else { /* operating mode 1 */
> -		gpiod_set_value(chip->convert_pin, 0);
> -		gpiod_set_value(chip->convert_pin, 1);
> +		gpiod_set_value_cansleep(chip->convert_pin, 0);
> +		gpiod_set_value_cansleep(chip->convert_pin, 1);
>  	}
>  
>  	if (chip->id == ID_AD7816 || chip->id == ID_AD7817) {
> -		while (gpiod_get_value(chip->busy_pin))
> +		while (gpiod_get_value_cansleep(chip->busy_pin))
>  			cpu_relax();
>  	}
>  
> -	gpiod_set_value(chip->rdwr_pin, 0);
> -	gpiod_set_value(chip->rdwr_pin, 1);
> -	ret = spi_read(spi_dev, &buf, sizeof(*data));
> -	if (ret < 0) {
> -		dev_err(&spi_dev->dev, "SPI data read error\n");
> +	gpiod_set_value_cansleep(chip->rdwr_pin, 0);
> +	gpiod_set_value_cansleep(chip->rdwr_pin, 1);
> +	ret = spi_read(spi_dev, &buf, sizeof(buf));
> +	if (ret < 0)
>  		return ret;
> -	}
>  
>  	*data = be16_to_cpu(buf);
> +	return 0;
> +}
>  
> +static int __maybe_unused ad7816_spi_read(struct ad7816_chip_info *chip, u8 channel, u16 *data)
> +{
> +	int ret;
> +
> +	mutex_lock(&chip->io_lock);
> +	ret = ad7816_spi_read_locked(chip, channel, data);
> +	mutex_unlock(&chip->io_lock);
>  	return ret;
>  }
>  
> -static int ad7816_spi_write(struct ad7816_chip_info *chip, u8 data)
> +static int ad7816_spi_write_locked(struct ad7816_chip_info *chip, u8 data)
>  {
>  	struct spi_device *spi_dev = chip->spi_dev;
> -	int ret;
>  
> -	gpiod_set_value(chip->rdwr_pin, 1);
> -	gpiod_set_value(chip->rdwr_pin, 0);
> -	ret = spi_write(spi_dev, &data, sizeof(data));
> -	if (ret < 0)
> -		dev_err(&spi_dev->dev, "SPI oti data write error\n");
> +	gpiod_set_value_cansleep(chip->rdwr_pin, 1);
> +	gpiod_set_value_cansleep(chip->rdwr_pin, 0);
> +	return spi_write(spi_dev, &data, sizeof(data));
> +}
> +
> +static int __maybe_unused ad7816_spi_write(struct ad7816_chip_info *chip, u8 data)
> +{
> +	int ret;
>  
> +	mutex_lock(&chip->io_lock);
> +	ret = ad7816_spi_write_locked(chip, data);
> +	mutex_unlock(&chip->io_lock);
>  	return ret;
>  }
>  
> @@ -122,10 +142,13 @@ static ssize_t ad7816_show_mode(struct device *dev,
>  {
>  	struct iio_dev *indio_dev = dev_to_iio_dev(dev);
>  	struct ad7816_chip_info *chip = iio_priv(indio_dev);
> +	ssize_t ret;
> +
> +	mutex_lock(&chip->io_lock);
> +	ret = sysfs_emit(buf, "%s\n", chip->mode ? "power-save" : "full");
> +	mutex_unlock(&chip->io_lock);

This locking isn't required.  If you're deliberately racing against
yourself and you trick the driver into briefly printing that it's
in power-save mode instead of full then, you know, who cares?  You're
allowed to change the mode so you just pranked your ownself for no
reason.

>  
> -	if (chip->mode)
> -		return sprintf(buf, "power-save\n");
> -	return sprintf(buf, "full\n");
> +	return ret;
>  }
>  
>  static ssize_t ad7816_store_mode(struct device *dev,
> @@ -136,13 +159,18 @@ static ssize_t ad7816_store_mode(struct device *dev,
>  	struct iio_dev *indio_dev = dev_to_iio_dev(dev);
>  	struct ad7816_chip_info *chip = iio_priv(indio_dev);
>  
> -	if (strcmp(buf, "full") == 0) {
> -		gpiod_set_value(chip->rdwr_pin, 1);
> +	mutex_lock(&chip->io_lock);
> +	if (sysfs_streq(buf, "full")) {

Using sysfs_streq() is unrelated.  Don't mix unrelated stuff into the
same patch.

> +		gpiod_set_value_cansleep(chip->rdwr_pin, 1);
>  		chip->mode = AD7816_FULL;
> -	} else {
> -		gpiod_set_value(chip->rdwr_pin, 0);
> +	} else if (sysfs_streq(buf, "power-save")) {
> +		gpiod_set_value_cansleep(chip->rdwr_pin, 0);
>  		chip->mode = AD7816_PD;
> +	} else {
> +		mutex_unlock(&chip->io_lock);
> +		return -EINVAL;
>  	}
> +	mutex_unlock(&chip->io_lock);
>  
>  	return len;
>  }
> @@ -156,7 +184,7 @@ static ssize_t ad7816_show_available_modes(struct device *dev,
>  					   struct device_attribute *attr,
>  					   char *buf)
>  {
> -	return sprintf(buf, "full\npower-save\n");
> +	return sysfs_emit(buf, "full\npower-save\n");

Unrelated.

>  }
>  
>  static IIO_DEVICE_ATTR(available_modes, 0444, ad7816_show_available_modes,
> @@ -168,8 +196,13 @@ static ssize_t ad7816_show_channel(struct device *dev,
>  {
>  	struct iio_dev *indio_dev = dev_to_iio_dev(dev);
>  	struct ad7816_chip_info *chip = iio_priv(indio_dev);
> +	ssize_t ret;
>  
> -	return sprintf(buf, "%d\n", chip->channel_id);
> +	mutex_lock(&chip->io_lock);
> +	ret = sysfs_emit(buf, "%u\n", chip->channel_id);
> +	mutex_unlock(&chip->io_lock);
> +
> +	return ret;
>  }
>  
>  static ssize_t ad7816_store_channel(struct device *dev,
> @@ -190,17 +223,34 @@ static ssize_t ad7816_store_channel(struct device *dev,
>  		dev_err(&chip->spi_dev->dev, "Invalid channel id %lu for %s.\n",
>  			data, indio_dev->name);
>  		return -EINVAL;
> -	} else if (strcmp(indio_dev->name, "ad7818") == 0 && data > 1) {
> -		dev_err(&chip->spi_dev->dev,
> -			"Invalid channel id %lu for ad7818.\n", data);
> -		return -EINVAL;
> -	} else if (strcmp(indio_dev->name, "ad7816") == 0 && data > 0) {
> -		dev_err(&chip->spi_dev->dev,
> -			"Invalid channel id %lu for ad7816.\n", data);
> +	}
> +
> +	switch (chip->id) {
> +	case ID_AD7816:
> +		if (data > 0) {
> +			dev_err(&chip->spi_dev->dev,
> +				"Invalid channel id %lu for ad7816.\n", data);
> +			return -EINVAL;
> +		}
> +		break;
> +	case ID_AD7818:
> +		if (data > 1) {
> +			dev_err(&chip->spi_dev->dev,
> +				"Invalid channel id %lu for ad7818.\n", data);
> +			return -EINVAL;
> +		}
> +		break;
> +	case ID_AD7817:
> +		/* AD7817 allows all channels up to AD7816_CS_MAX */
> +		break;
> +	default:
> +		dev_err(&chip->spi_dev->dev, "Unknown chip id %lu\n", chip->id);
>  		return -EINVAL;
>  	}
>  
> -	chip->channel_id = data;
> +	mutex_lock(&chip->io_lock);
> +	chip->channel_id = (u8)data;

There are way too many unrelated things happening and a bunch of them are
controversial in ways that you aren't expecting.  For example, a lot of
people (me) hate pointless casts...  We already have bounds checked data
so there is no need to cast.  If we wanted to we could declare data as a
u8, and that would also be pointless since we have to bounds check it
anyway but at least it's better than casting.

Version 3 wasn't very far off from being correct and in version 4 you
fixed the remaining locking issues I saw.  But the patch does way too
many other things too.

regards,
dan carpenter


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ