lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250227224738.6d7ebd8e@akair>
Date: Thu, 27 Feb 2025 22:47:38 +0100
From: Andreas Kemnade <andreas@...nade.info>
To: Nishanth Menon <nm@...com>
Cc: Andi Shyti <andi.shyti@...nel.org>, <vigneshr@...com>,
 <aaro.koskinen@....fi>, <khilman@...libre.com>, <rogerq@...nel.org>,
 <tony@...mide.com>, <jmkrzyszt@...il.com>, <reidt@...com>,
 <wsa@...nel.org>, <linux-omap@...r.kernel.org>,
 <linux-i2c@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
 <stable@...nel.org>
Subject: Re: [PATCH] i2c: omap: fix IRQ storms

Am Thu, 27 Feb 2025 08:20:55 -0600
schrieb Nishanth Menon <nm@...com>:

> On 10:08-20250220, Andreas Kemnade wrote:
> > Am Wed, 19 Feb 2025 20:22:13 +0100
> > schrieb Andi Shyti <andi.shyti@...nel.org>:
> >   
> > > Hi,
> > > 
> > > On Fri, Feb 07, 2025 at 07:54:35PM +0100, Andreas Kemnade wrote:  
> > > > On the GTA04A5 writing a reset command to the gyroscope causes IRQ
> > > > storms because NACK IRQs are enabled and therefore triggered but not
> > > > acked.
> > > > 
> > > > Sending a reset command to the gyroscope by
> > > > i2cset 1 0x69 0x14 0xb6
> > > > with an additional debug print in the ISR (not the thread) itself
> > > > causes
> > > > 
> > > > [ 363.353515] i2c i2c-1: ioctl, cmd=0x720, arg=0xbe801b00
> > > > [ 363.359039] omap_i2c 48072000.i2c: addr: 0x0069, len: 2, flags: 0x0, stop: 1
> > > > [ 363.366180] omap_i2c 48072000.i2c: IRQ LL (ISR = 0x1110)
> > > > [ 363.371673] omap_i2c 48072000.i2c: IRQ (ISR = 0x0010)
> > > > [ 363.376892] omap_i2c 48072000.i2c: IRQ LL (ISR = 0x0102)
> > > > [ 363.382263] omap_i2c 48072000.i2c: IRQ LL (ISR = 0x0102)
> > > > [ 363.387664] omap_i2c 48072000.i2c: IRQ LL (ISR = 0x0102)
> > > > repeating till infinity
> > > > [...]
> > > > (0x2 = NACK, 0x100 = Bus free, which is not enabled)
> > > > Apparently no other IRQ bit gets set, so this stalls.
> > > > 
> > > > Do not ignore enabled interrupts and make sure they are acked.
> > > > If the NACK IRQ is not needed, it should simply not enabled, but
> > > > according to the above log, caring about it is necessary unless
> > > > the Bus free IRQ is enabled and handled. The assumption that is
> > > > will always come with a ARDY IRQ, which was the idea behind
> > > > ignoring it, proves wrong.
> > > > It is true for simple reads from an unused address.
> > > > 
> > > > So revert
> > > > commit c770657bd261 ("i2c: omap: Fix standard mode false ACK readings").
> > > > 
> > > > The offending commit was used to reduce the false detections in
> > > > i2cdetect. i2cdetect warns for confusing the I2C bus, so having some
> > > > rare false detections (I have never seen such on my systems) is the
> > > > lesser devil than having basically the system hanging completely.
> > > > 
> > > > No more details came to light in the corresponding email thread since
> > > > several months:
> > > > https://lore.kernel.org/linux-omap/20230426194956.689756-1-reidt@ti.com/
> > > > so no better fix to solve both problems can be developed right now.    
> > > 
> > > I need someone from TI or someone who can test to ack here.
> > > 
> > > Can someone help?
> > >  
> > The original (IMHO minor) problem which should be fixed by c770657bd261
> > is hard to test, I have never seen that on any system (and as a
> > platform maintainer have a bunch of them) I have access to.
> > There is not much description anywhere about the system in which the
> > original system occured, and no reaction since several months from the
> > author, so I do not see anything which can be done.
> > Maybe it was just faulty hardware.
> > 
> > As said in the commit message, reverting it should be the lesser devil.
> > And that state was tested for many years.  
> 
> Can we not handle this slightly differently? leave the fix based on
> compatible? we know that the i2c controller changed over time. the
> i2cdetect bug fixed by c770657bd261 esp hard to find and fix.
> 
looking a bit more deeper in:
Why do we have omap_i2c_isr at all? Can there any case that
stat & mask == 0 there (without c770657bd261 applied)?

I looked at omap_i2c_xfer_data() and nothing interesting seems to
happen without other bits besides OMAP_I2C_STAT_NACK. 
Looking again, things get interesting when that loop is left.

Maybe just acking NACK, setting cmd_err and return -EAGAIN if no other
bits are set. That should not cause changes to scenarios where NACK
comes with other bits set. Lets check whether that fixes the
mess I see here. Well, everything is better then having that IRQ going
mad.

For reference, the sensor involved was the BMG160. Because it is not
enabled in omap2plus_defconfig, the issue did not show up early.
From my understanding, that there is a NACK after the reset command
data byte is sent. @Nikolaus: are there any nice and simple test points
for a scope?

Do you have any chance to test such a scenario on any device requiring
the c770657bd261 applied?

Regards,
Andreas

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ