linux-kernel - Re: [PATCH] serial: sc16is7xx: address RX timeout interrupt errata

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ecc90a62-7cfa-45c9-9f6c-188e2c8ac50f@zonque.org>
Date:   Wed, 15 Nov 2023 12:22:10 +0100
From:   Daniel Mack <daniel@...que.org>
To:     Lech Perczak <lech.perczak@...lingroup.com>,
        Hugo Villeneuve <hugo@...ovil.com>
Cc:     gregkh@...uxfoundation.org, jirislaby@...nel.org,
        u.kleine-koenig@...gutronix.de, linux-serial@...r.kernel.org,
        linux-kernel@...r.kernel.org, Maxim Popov <maxim.snafu@...il.com>,
        stable@...r.kernel.org
Subject: Re: [PATCH] serial: sc16is7xx: address RX timeout interrupt errata

Hi Lech,

On 11/15/23 11:51, Lech Perczak wrote:
> W dniu 14.11.2023 o 16:55, Daniel Mack pisze:
>> Hi Hugo,
>>
>> On 11/14/23 16:20, Hugo Villeneuve wrote:
>>> On Tue, 14 Nov 2023 08:49:04 +0100
>>> Daniel Mack <daniel@...que.org> wrote:
>>>> This devices has a silicon bug that makes it report a timeout interrupt
>>>> but no data in FIFO.
>>>>
>>>> The datasheet states the following in the errata section 18.1.4:
>>>>
>>>>   "If the host reads the receive FIFO at the at the same time as a
>>>>   time-out interrupt condition happens, the host might read 0xCC
>>>>   (time-out) in the Interrupt Indication Register (IIR), but bit 0
>>>>   of the Line Status Register (LSR) is not set (means there is not
>>>>   data in the receive FIFO)."
>>>>
>>>> When this happens, the loop in sc16is7xx_irq() will run forever,
>>>> which effectively blocks the i2c bus and breaks the functionality
>>>> of the UART.
>>>>
>>>> From the information above, it is assumed that when the bug is
>>>> triggered, the FIFO does in fact have payload in its buffer, but the
>>>> fill level reporting is off-by-one. Hence this patch fixes the issue
>>>> by reading one byte from the FIFO when that condition is detected.
>>> From what I understand from the errata, when the problem occurs, it
>>> affects bit 0 of the LSR register. I see no mention that it
>>> also affects the RX FIFO level register (SC16IS7XX_RXLVL_REG)?
>> True, the errata doesn't explicitly mention that, but tests have shown
>> that the RXLVL register is equally affected.
>>
>>> LSR[0] would be checked only if we were using polled mode of
>>> operation, but we always use the interrupt mode (IRQ), and therefore I
>>> would say that this errata doesn't apply to this driver, and the
>>> patch is not necessary...
>> Well, it is. We have seen this bug in the wild and extensively
>> stress-tested the patch on dozens of boards for many days. Without this
>> patch, kernels on affected systems would consume a lot of CPU cycles in
>> the interrupt threads and effectively render the I2C bus unusable due to
>> the busy polling.
>>
>> With this patch applied, we were no longer able to reproduce the issue.
> Could you share some more details on the setup you use to reproduce this? I'd like to try out as well.

We have boards with 2 I2C busses with an SC16IS752IBS on both. The UARTs
are configured in infrared mode, and they send receive IR signals
constantly. I guess the same would happen with other electrical
interfaces, but the important bit is that the UARTs see a steady stream
of inbound data.

The bug has hit us on production units and when it does, sc16is7xx_irq()
would spin forever because sc16is7xx_port_irq() keeps seeing an
interrupt in the IIR register that is not cleared because the driver
does not call into sc16is7xx_handle_rx() unless the RXLVL register
reports at least one byte in the FIFO.

Note that this issue might only occur in revision E of the silicon. And
there seems to be now way to read the revision code through I2C, so I
guess you won't be able to figure out easily whether your chip is affected.

Let me know if I can provide more information.


Thanks,
Daniel