linux-kernel - Re: Errant readings on LM81 with T2080 SoC

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <d37a114c-fa3f-40e8-4d85-52eb1ff03c37@roeck-us.net>
Date:   Wed, 10 Mar 2021 23:41:33 -0800
From:   Guenter Roeck <linux@...ck-us.net>
To:     Chris Packham <Chris.Packham@...iedtelesis.co.nz>,
        "jdelvare@...e.com" <jdelvare@...e.com>
Cc:     "linux-hwmon@...r.kernel.org" <linux-hwmon@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-i2c@...r.kernel.org" <linux-i2c@...r.kernel.org>,
        "linuxppc-dev@...ts.ozlabs.org" <linuxppc-dev@...ts.ozlabs.org>
Subject: Re: Errant readings on LM81 with T2080 SoC

On 3/10/21 1:48 PM, Chris Packham wrote:
> 
> On 10/03/21 6:06 pm, Guenter Roeck wrote:
>> On 3/9/21 6:19 PM, Chris Packham wrote:
>>> On 9/03/21 9:27 am, Chris Packham wrote:
>>>> On 8/03/21 5:59 pm, Guenter Roeck wrote:
>>>>> Other than that, the only other real idea I have would be to monitor
>>>>> the i2c bus.
>>>> I am in the fortunate position of being able to go into the office and
>>>> even happen to have the expensive scope at the moment. Now I just need
>>>> to find a tame HW engineer so I don't burn myself trying to attach the
>>>> probes.
>>> One thing I see on the scope is that when there is a CPU load there
>>> appears to be some clock stretching going on (SCL is held low some
>>> times). I don't see it without the CPU load. It's hard to correlate a
>>> clock stretching event with a bad read or error but it is one area where
>>> the SMBUS spec has a maximum that might cause the device to give up waiting.
>>>
>> Do you have CONFIG_PREEMPT enabled in your kernel ? But even without
>> that it is possible that the hot loops at the beginning and end of
>> each operation mess up the driver and cause it to sleep longer
>> than intended. Did you try usleep_range() ?
> 
> I've been running with and without CONFIG_PREEMPT. The failures happen 
> with both.
> 
> I did try usleep_range() and still saw failures.
> 

Bummer. What is really weird is that you see clock stretching under
CPU load. Normally clock stretching is triggered by the device, not
by the host. I wonder if there are some timing differences before
the clock stretching happens.

Anyway, I just sent a set of three patches to the list; maybe you
can give it a try. The patches convert the driver to the with_info
API and drop local caching.

The code is module tested with the register dumps I have available
for adm9240 and lm81, but it would be great to get test coverage
on real hardware. I don't really expect it to solve your problem,
but it does reduce and modify the load on the chip (because
registers are no longer read in bursts), so it may have some
positive impact.

>> On a side note, can you send me a register dump for the lm81 ?
>> It would be useful for my module test code.
> 
> Here you go this is from a largely unconfigured LM81
> 

Thanks, that helped a lot!

Guenter