lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 12 Oct 2017 07:42:11 -0700
From:   Guenter Roeck <linux@...ck-us.net>
To:     Peter Rosin <peda@...ntia.se>,
        Ludovic Desroches <ludovic.desroches@...rochip.com>
Cc:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-i2c@...r.kernel.org" <linux-i2c@...r.kernel.org>,
        Jean Delvare <jdelvare@...e.com>, linux-hwmon@...r.kernel.org
Subject: Re: Sluggish AT91 I2C driver causes SMBus timeouts

On 10/12/2017 07:32 AM, Peter Rosin wrote:
> On 2017-10-12 13:35, Peter Rosin wrote:
>> Hi!
>>
>> I have encountered an "interesting" bug. It silently corrupts data
>> and is generally nasty...
>>
>> On an I2C bus, driven by the at91 driver and DMA (an Atmel
>> sama5d31 chip), I have an 256 byte eeprom (NXP SE97BTP). I'm using
>> Linux v4.13.
>>
>> The at24 driver for the eeprom detects that the I2C adapter is
>> capable of I2C transactions and selects that over SMBus. Reads
>> are done in 128 byte chunks. However, sometimes there is some
>> kind of event that disturbs the transactions such that the very
>> last bit och the very last byte (and the following NACK and STOP)
>> of such chunks are delayed for a long time (the latest incident
>> shows 85ms on the scope). That is too long for the eeprom which
>> is expecting SMBus and times out after about 30 ms. When the
>> eeprom times out, it just releases the data line so that it is
>> pulled up high. The I2C driver does not notice this, and when it
>> finally gets going, it reads a one for the last bit instead of
>> the expected zero. Since it is the last byte of the read, a NACK
>> is expected and since the eeprom has timed out the NACK is there.
>> And the STOP condition also looks normal (expected since it is
>> generated by the driver itself). So, the driver has not noticed
>> anything funny. But the data is corrupted.
>>
>> I can work around this by disabling the SMBus timeout in the eeprom
>> with:
>>
>> 	i2cset -f 0 0x18 0x22 0x8100
>>
>> But that is done on a different I2C address (the eeprom is on
>> address 0x50), since the chip is a combined temperature sensor and
>> eeprom, and the SMBus timeout bit is of course in a temperature
>> sensor register.
>>
>> HOWEVER, I fail to see how this is limited to my case with this
>> eeprom. Any SMBus chip with a timeout will suffer the same fate.
>> The real bug is that this happens without the driver noticing it.
>> And why is there a 85ms delay in the middle of the last byte?
>> Sure, I can see why there might be a delay before finishing up
>> with a STOP condition or between bytes if there needs to be some
>> DMA setup at some interval, but after the 7th bit of a byte?
>>
>> For a lot of transactions on the I2C bus there is no delay before
>> the last bit. And most of the time there is no delay for the
>> eeprom reads either; the delay only occurs when it feels like it.
>>
>> This does not feel good at all.
> 
> I added some traces to i2c-at91.c and, AFAIU, it's the call to
> at91_twi_read_data_dma_callback that sometimes arrives later than
> desired. Once the callback runs, the transfer completes swiftly.
> 
> After reading the comments in that driver I suppose the HW holds
> on to the last data-bit until it knows whether to ACK or NACK in
> the following bit.
> 
> But given this, it is questionable if this driver/HW combo can
> claim support for SMBus. But then again, I expect many things
> suffer from similar scheduling delays (presumably that's what's
> going on) so this is probably not special to i2c-at91.c...
> 
> Since this is probably a very generic problem and I just happened
> to hit it for the eeprom, I wonder if it would be ok to add a
> workaround, as below, to the temperature sensor driver part of this
> chip? (with suitable comments, defines for the constants etc -
> setting the 0x0080 bit in reg 0x22 disables the SMBus timeout)
> 
> Cheers,
> Peter
> 
> diff --git a/drivers/hwmon/jc42.c b/drivers/hwmon/jc42.c
> index 1bf22eff0b08..3e72bd8e06d1 100644
> --- a/drivers/hwmon/jc42.c
> +++ b/drivers/hwmon/jc42.c
> @@ -416,6 +416,13 @@ static int jc42_detect(struct i2c_client *client, struct i2c_board_info *info)
>   	if ((cap & 0xff00) || (config & 0xf800))
>   		return -ENODEV;
>   
> +	if (manid == NXP_MANID && (devid & SE97_DEVID_MASK) == SE97_DEVID) {
> +		int smbus = i2c_smbus_read_word_swapped(client, 0x22);
> +		if (smbus < 0)
> +			return -ENODEV;
> +		i2c_smbus_write_word_swapped(client, 0x22, smbus | 0x0080);
> +	}
> +

Outch. Not like that; this would affect every board with this chip, not just this one.
We would need something like a DT property to do that (smbus-timeout-disable is used
in other drivers).

.. and definitely not in the detect function. This would have to be done in probe.

Guenter

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ