lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 25 Aug 2020 22:22:21 +0000
From:   Chris Packham <Chris.Packham@...iedtelesis.co.nz>
To:     Heiner Kallweit <hkallweit1@...il.com>,
        "broonie@...nel.org" <broonie@...nel.org>,
        "mpe@...erman.id.au" <mpe@...erman.id.au>,
        "benh@...nel.crashing.org" <benh@...nel.crashing.org>,
        "paulus@...ba.org" <paulus@...ba.org>
CC:     "linux-spi@...r.kernel.org" <linux-spi@...r.kernel.org>,
        "linuxppc-dev@...ts.ozlabs.org" <linuxppc-dev@...ts.ozlabs.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: fsl_espi errors on v5.7.15

On 25/08/20 7:22 pm, Heiner Kallweit wrote:

<snip>
> I've been staring at spi-fsl-espi.c for while now and I think I've
>> identified a couple of deficiencies that may or may not be related to my
>> issue.
>>
>> First I think the 'Transfer done but SPIE_DON isn't set' message can be
>> generated spuriously. In fsl_espi_irq() we read the ESPI_SPIE register.
>> We also write back to it to clear the current events. We re-read it in
>> fsl_espi_cpu_irq() and complain when SPIE_DON is not set. But we can
>> naturally end up in that situation if we're doing a large read. Consider
>> the messages for reading a block of data from a spi-nor chip
>>
>>    tx = READ_OP + ADDR
>>    rx = data
>>
>> We setup the transfer and pump out the tx_buf. The first interrupt goes
>> off and ESPI_SPIE has SPIM_DON and SPIM_RXT set. We empty the rx fifo,
>> clear ESPI_SPIE and wait for the next interrupt. The next interrupt
>> fires and this time we have ESPI_SPIE with just SPIM_RXT set. This
>> continues until we've received all the data and we finish with ESPI_SPIE
>> having only SPIM_RXT set. When we re-read it we complain that SPIE_DON
>> isn't set.
>>
>> The other deficiency is that we only get an interrupt when the amount of
>> data in the rx fifo is above FSL_ESPI_RXTHR. If there are fewer than
>> FSL_ESPI_RXTHR left to be received we will never pull them out of the fifo.
>>
> SPIM_DON will trigger an interrupt once the last characters have been
> transferred, and read the remaining characters from the FIFO.

The T2080RM that I have says the following about the DON bit

"Last character was transmitted. The last character was transmitted and 
a new command can be written for the next frame."

That does at least seem to fit with my assertion that it's all about the 
TX direction. But the fact that it doesn't happen all the time throws 
some doubt on it.

> I think the reason I'm seeing some variability is because of how fast
>> (or slow) the interrupts get processed and how fast the spi-nor chip can
>> fill the CPUs rx fifo.
>>
> To rule out timing issues at high bus frequencies I initially asked
> for re-testing at lower frequencies. If you e.g. limit the bus to 1 MHz
> or even less, then timing shouldn't be an issue.
Yes I've currently got spi-max-frequency = <1000000>; in my dts. I would 
also expect a slower frequency would fit my "DON is for TX" narrative.
> Last relevant functional changes have been done almost 4 years ago.
> And yours is the first such report I see. So question is what could be so
> special with your setup that it seems you're the only one being affected.
> The scenarios you describe are standard, therefore much more people
> should be affected in case of a driver bug.
Agreed. But even on my hardware (which may have a latent issue despite 
being in the field for going on 5 years) the issue only triggers under 
some fairly specific circumstances.
> You said that kernel config impacts how frequently the issue happens.
> Therefore question is what's the diff in kernel config, and how could
> the differences be related to SPI.

It did seem to be somewhat random. Things like CONFIG_PREEMPT have an 
impact but every time I found something that seemed to be having an 
impact I've been able to disprove it. I actually think its about how 
busy the system is which may or may not affect when we get round to 
processing the interrupts.

I have managed to get the 'Transfer done but SPIE_DON isn't set!' to 
occur on the T2080RDB.

I've had to add the following to expose the environment as a mtd partition

diff --git a/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi 
b/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi
index ff87e67c70da..fbf95fc1fd68 100644
--- a/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi
+++ b/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi
@@ -116,6 +116,15 @@ flash@0 {
                                 compatible = "micron,n25q512ax3", 
"jedec,spi-nor";
                                 reg = <0>;
                                 spi-max-frequency = <10000000>; /* 
input clock */
+
+                               partition@...oot {
+                                        reg = <0x00000000 0x00100000>;
+                                        label = "u-boot";
+                                };
+                                partition@...oot-env {
+                                        reg = <0x00100000 0x00010000>;
+                                        label = "u-boot-env";
+                                };
                         };
                 };

And I'm using the following script to poke at the environment (warning 
if anyone does try this and the bug hits it can render your u-boot 
environment invalid).

cat flash/fw_env_test.sh
#!/bin/sh

generate_fw_env_config()
{
   cat /proc/mtd | sed 's/[:"]//g' | while read dev size erasesize name ; do
      echo "$dev $size $erasesize $name"
      [ "$name" = "u-boot-env" ] && echo "/dev/$dev 0x0000 0x2000 
$erasesize" >/flash/fw_env.config
   done
}

cycles=10
[ $# -ge 1 ] && cycles=$1

generate_fw_env_config

fw_printenv -c /flash/fw_env.config

dmesg -c >/dev/null
x=0
while [ $x -lt $cycles ]; do
     fw_printenv -c /flash/fw_env.config >/dev/null || break
     fw_setenv -c /flash/fw_env.config foo $RANDOM || break;
     dmesg -c | grep -q fsl_espi && break;
     let x=x+1
done

echo "Ran $x cycles"

Powered by blists - more mailing lists