linux-kernel - Re: fsl_espi errors on v5.7.15

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <ee6a85fd-35d3-2adf-02ea-35009749194b@alliedtelesis.co.nz>
Date:   Sun, 6 Sep 2020 21:03:46 +0000
From:   Chris Packham <Chris.Packham@...iedtelesis.co.nz>
To:     Heiner Kallweit <hkallweit1@...il.com>
CC:     Nicholas Piggin <npiggin@...il.com>,
        "benh@...nel.crashing.org" <benh@...nel.crashing.org>,
        "broonie@...nel.org" <broonie@...nel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-spi@...r.kernel.org" <linux-spi@...r.kernel.org>,
        "linuxppc-dev@...ts.ozlabs.org" <linuxppc-dev@...ts.ozlabs.org>,
        "mpe@...erman.id.au" <mpe@...erman.id.au>,
        "paulus@...ba.org" <paulus@...ba.org>
Subject: Re: fsl_espi errors on v5.7.15

(resend as something decided to insert html, some context stripped)

On 5/09/20 5:23 am, Heiner Kallweit wrote:
> On Fri 4. Sep 2020 at 01:58, Chris Packham 
> <Chris.Packham@...iedtelesis.co.nz 
> <mailto:Chris.Packham@...iedtelesis.co.nz>> wrote:
>
>
<snip>
>
>
>     I tried ftrace but I really wasn't sure what I was looking for.
>
>     Capturing a "bad" case was pretty tricky. But I think I've
>     identified a
>
>     fix (I'll send it as a proper patch shortly). The gist is
>
>
>
>     diff --git a/drivers/spi/spi-fsl-espi.c b/drivers/spi/spi-fsl-espi.c
>
>     index 7e7c92cafdbb..cb120b68c0e2 100644
>
>     --- a/drivers/spi/spi-fsl-espi.c
>
>     +++ b/drivers/spi/spi-fsl-espi.c
>
>     @@ -574,13 +574,14 @@ static void fsl_espi_cpu_irq(struct fsl_espi
>
>     *espi, u32 events)
>
>       static irqreturn_t fsl_espi_irq(s32 irq, void *context_data)
>
>       {
>
>              struct fsl_espi *espi = context_data;
>
>     -       u32 events;
>
>     +       u32 events, mask;
>
>
>
>              spin_lock(&espi->lock);
>
>
>
>              /* Get interrupt events(tx/rx) */
>
>              events = fsl_espi_read_reg(espi, ESPI_SPIE);
>
>     -       if (!events) {
>
>     +       mask = fsl_espi_read_reg(espi, ESPI_SPIM);
>
>     +       if (!(events & mask)) {
>
>                      spin_unlock(&espi->lock);
>
>                      return IRQ_NONE;
>
>              }
>
>
>
>     The SPIE register contains the TXCNT so events is pretty much always
>
>     going to have something set. By checking events against what we've
>
>     actually requested interrupts for we don't see any spurious events.
>
>
> Usually we shouldn’t receive interrupts we’re not interested in, 
> except the interrupt is shared.
My theory is that we get an interrupt to the core for RXT and another 
for DON. With the changes to the powerpc interrupt handling and the fact 
that fsl_espi_cpu_irq() doesn't actually look at the specific event bits 
means that sometimes both events are handled in the processing of the 
first interrupt and the second one trips the SPI_DON not set error.

There's an old comment "SPI bus sometimes got lost interrupts..." which 
makes me wonder if the interrupt handling change has fixed this original 
problem.

I still think the change makes sense regardless because the SPIE 
register has some counter fields in it.

> This leads to the question: is the SPI interrupt shared with another 
> device on your system?
It's not shared on either the custom board or the T2080RDB.
> Do you see spurious interrupts with the patch under 
> /proc/irq/(irq)/spurious?

Yes it looks like it

[root@...uxbox ~]# cat /proc/irq/53/spurious
count 3126
unhandled 0
last_unhandled 0 ms

[root@...uxbox ~]# /flash/dd_test.sh

[root@...uxbox ~]# cat /proc/irq/53/spurious
count 1016
unhandled 0
last_unhandled 4294746100 ms

[root@...uxbox ~]# /flash/dd_test.sh

[root@...uxbox ~]# cat /proc/irq/53/spurious
count 88391
unhandled 0
last_unhandled 4294746100 ms

[root@...uxbox ~]# /flash/dd_test.sh

[root@...uxbox ~]# cat /proc/irq/53/spurious
count 72459
unhandled 2
last_unhandled 4294758632 ms
>
>
>
>     I've tested this on the T2080RDB and on our custom hardware and it
>     seems
>
>     to resolve the problem.
>
>
>