[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ecc90a62-7cfa-45c9-9f6c-188e2c8ac50f@zonque.org>
Date: Wed, 15 Nov 2023 12:22:10 +0100
From: Daniel Mack <daniel@...que.org>
To: Lech Perczak <lech.perczak@...lingroup.com>,
Hugo Villeneuve <hugo@...ovil.com>
Cc: gregkh@...uxfoundation.org, jirislaby@...nel.org,
u.kleine-koenig@...gutronix.de, linux-serial@...r.kernel.org,
linux-kernel@...r.kernel.org, Maxim Popov <maxim.snafu@...il.com>,
stable@...r.kernel.org
Subject: Re: [PATCH] serial: sc16is7xx: address RX timeout interrupt errata
Hi Lech,
On 11/15/23 11:51, Lech Perczak wrote:
> W dniu 14.11.2023 o 16:55, Daniel Mack pisze:
>> Hi Hugo,
>>
>> On 11/14/23 16:20, Hugo Villeneuve wrote:
>>> On Tue, 14 Nov 2023 08:49:04 +0100
>>> Daniel Mack <daniel@...que.org> wrote:
>>>> This devices has a silicon bug that makes it report a timeout interrupt
>>>> but no data in FIFO.
>>>>
>>>> The datasheet states the following in the errata section 18.1.4:
>>>>
>>>> "If the host reads the receive FIFO at the at the same time as a
>>>> time-out interrupt condition happens, the host might read 0xCC
>>>> (time-out) in the Interrupt Indication Register (IIR), but bit 0
>>>> of the Line Status Register (LSR) is not set (means there is not
>>>> data in the receive FIFO)."
>>>>
>>>> When this happens, the loop in sc16is7xx_irq() will run forever,
>>>> which effectively blocks the i2c bus and breaks the functionality
>>>> of the UART.
>>>>
>>>> From the information above, it is assumed that when the bug is
>>>> triggered, the FIFO does in fact have payload in its buffer, but the
>>>> fill level reporting is off-by-one. Hence this patch fixes the issue
>>>> by reading one byte from the FIFO when that condition is detected.
>>> From what I understand from the errata, when the problem occurs, it
>>> affects bit 0 of the LSR register. I see no mention that it
>>> also affects the RX FIFO level register (SC16IS7XX_RXLVL_REG)?
>> True, the errata doesn't explicitly mention that, but tests have shown
>> that the RXLVL register is equally affected.
>>
>>> LSR[0] would be checked only if we were using polled mode of
>>> operation, but we always use the interrupt mode (IRQ), and therefore I
>>> would say that this errata doesn't apply to this driver, and the
>>> patch is not necessary...
>> Well, it is. We have seen this bug in the wild and extensively
>> stress-tested the patch on dozens of boards for many days. Without this
>> patch, kernels on affected systems would consume a lot of CPU cycles in
>> the interrupt threads and effectively render the I2C bus unusable due to
>> the busy polling.
>>
>> With this patch applied, we were no longer able to reproduce the issue.
> Could you share some more details on the setup you use to reproduce this? I'd like to try out as well.
We have boards with 2 I2C busses with an SC16IS752IBS on both. The UARTs
are configured in infrared mode, and they send receive IR signals
constantly. I guess the same would happen with other electrical
interfaces, but the important bit is that the UARTs see a steady stream
of inbound data.
The bug has hit us on production units and when it does, sc16is7xx_irq()
would spin forever because sc16is7xx_port_irq() keeps seeing an
interrupt in the IIR register that is not cleared because the driver
does not call into sc16is7xx_handle_rx() unless the RXLVL register
reports at least one byte in the FIFO.
Note that this issue might only occur in revision E of the silicon. And
there seems to be now way to read the revision code through I2C, so I
guess you won't be able to figure out easily whether your chip is affected.
Let me know if I can provide more information.
Thanks,
Daniel
Powered by blists - more mailing lists