linux-kernel - RE: Errant readings on LM81 with T2080 SoC

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <97910de5fd8c46fea1a17f0bd2b76fbc@AcuMS.aculab.com>
Date:   Mon, 15 Mar 2021 09:46:21 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     'Chris Packham' <Chris.Packham@...iedtelesis.co.nz>,
        'Guenter Roeck' <linux@...ck-us.net>,
        Wolfram Sang <wsa@...nel.org>
CC:     "linux-hwmon@...r.kernel.org" <linux-hwmon@...r.kernel.org>,
        "jdelvare@...e.com" <jdelvare@...e.com>,
        "linuxppc-dev@...ts.ozlabs.org" <linuxppc-dev@...ts.ozlabs.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-i2c@...r.kernel.org" <linux-i2c@...r.kernel.org>
Subject: RE: Errant readings on LM81 with T2080 SoC

From: Chris Packham
> Sent: 14 March 2021 21:26
> 
> On 12/03/21 10:25 pm, David Laight wrote:
> > From: Linuxppc-dev Guenter Roeck
> >> Sent: 11 March 2021 21:35
> >>
> >> On 3/11/21 1:17 PM, Chris Packham wrote:
> >>> On 11/03/21 9:18 pm, Wolfram Sang wrote:
> >>>>> Bummer. What is really weird is that you see clock stretching under
> >>>>> CPU load. Normally clock stretching is triggered by the device, not
> >>>>> by the host.
> >>>> One example: Some hosts need an interrupt per byte to know if they
> >>>> should send ACK or NACK. If that interrupt is delayed, they stretch the
> >>>> clock.
> >>>>
> >>> It feels like something like that is happening. Looking at the T2080
> >>> Reference manual there is an interesting timing diagram (Figure 14-2 if
> >>> someone feels like looking it up). It shows SCL low between the ACK for
> >>> the address and the data byte. I think if we're delayed in sending the
> >>> next byte we could violate Ttimeout or Tlow:mext from the SMBUS spec.
> >>>
> >> I think that really leaves you only two options that I can see:
> >> Rework the driver to handle critical actions (such as setting TXAK,
> >> and everything else that might result in clock stretching) in the
> >> interrupt handler, or rework the driver to handle everything in
> >> a high priority kernel thread.
> >
> > I'm not sure a high priority kernel thread will help.
> > Without CONFIG_PREEMPT (which has its own set of nasties)
> > a RT process won't be scheduled until the processor it last
> > ran on does a reschedule.
> > I don't think a kernel thread will be any different from a
> > user process running under the RT scheduler.
> >
> > I'm trying to remember the smbus spec (without remembering the I2C one).

> For those following along the spec is available here[0]. I know there's
> a 3.0 version[1] as well but the devices I'm dealing with are from a 2.0
> vintage.
> > While basically a clock+data bit-bang the slave is allowed to drive
> > the clock low to extend a cycle.
> > It may be allowed to do this at any point?
>
>  From what I can see it's actually the master extending the clock. Or
> more accurately holding it low between the address and data bytes (which
> from the T2080 reference manual looks expected). I think this may cause
> a strictly compliant SMBUS device to determine that Tlow:mext has been
> violated.

Yes, the spec does seem to assume that is a signal is stable
for 20ms something has gone 'horribly wrong'.
I wasn't worries about that, our fpga does the whole transaction
as a single command.
None of our slaves generate interrupts - so it is purely master/slave.

If you run your process under the RT scheduler it is unlikely
that pre-emption will be delayed by long enough to stop the process
running for 10ms.
I've seen >1ms delays (testing RTP audio), but most of the long
loops have a cond_resched() in them.

...

> Probably depends on the device implementation. I've got multiple other
> I2C/SMBUS devices and the LM81 seems to be the one that objects.

I bet most don't implement any of the timeouts.

I found one interesting pmbus device.
Sometimes it would detect a STOP condition because the data line
went high when it tri-stated its output driver in response to the
rising clock edge!
So it saw the same clock edge twice.

> [0] - http://www.smbus.org/specs/smbus20.pdf
> [1] - https://pmbus.org/Assets/PDFS/Public/SMBus_3_0_20141220.pdf

I should have both those - I've copied them to the directory where
I'd look for them first!

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)