lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 25 Aug 2020 03:54:42 +0000
From:   Chris Packham <Chris.Packham@...iedtelesis.co.nz>
To:     Heiner Kallweit <hkallweit1@...il.com>,
        "broonie@...nel.org" <broonie@...nel.org>,
        "mpe@...erman.id.au" <mpe@...erman.id.au>,
        "benh@...nel.crashing.org" <benh@...nel.crashing.org>,
        "paulus@...ba.org" <paulus@...ba.org>
CC:     "linux-spi@...r.kernel.org" <linux-spi@...r.kernel.org>,
        "linuxppc-dev@...ts.ozlabs.org" <linuxppc-dev@...ts.ozlabs.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: fsl_espi errors on v5.7.15


On 25/08/20 10:04 am, Chris Packham wrote:
>
> On 20/08/20 9:08 am, Chris Packham wrote:
>>
>> On 19/08/20 6:15 pm, Heiner Kallweit wrote:
>>> On 19.08.2020 00:44, Chris Packham wrote:
>>>> Hi Again,
>>>>
>>>> On 17/08/20 9:09 am, Chris Packham wrote:
>>>>
>>>>> On 14/08/20 6:19 pm, Heiner Kallweit wrote:
>>>>>> On 14.08.2020 04:48, Chris Packham wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I'm seeing a problem with accessing spi-nor after upgrading a T2081
>>>>>>> based system to linux v5.7.15
>>>>>>>
>>>>>>> For this board u-boot and the u-boot environment live on spi-nor.
>>>>>>>
>>>>>>> When I use fw_setenv from userspace I get the following kernel logs
>>>>>>>
>>>>>>> # fw_setenv foo=1
>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>>>> fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't 
>>>>>>> empty!
>>>>>>> fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>>>>>> fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't 
>>>>>>> empty!
>>>>>>> fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>>>>>> fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't 
>>>>>>> empty!
>>>>>>> fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>>>>>> ...
>>>>>>>
>>>>>> This error reporting doesn't exist yet in 4.4. So you may have an 
>>>>>> issue
>>>>>> under 4.4 too, it's just not reported.
>>>>>> Did you verify that under 4.4 fw_setenv actually has an effect?
>>>>> Just double checked and yes under 4.4 the setting does get saved.
>>>>>>> If I run fw_printenv (before getting it into a bad state) it is 
>>>>>>> able to
>>>>>>> display the content of the boards u-boot environment.
>>>>>>>
>>>>>> This might indicate an issue with spi being locked. I've seen 
>>>>>> related
>>>>>> questions, just use the search engine of your choice and check for
>>>>>> fw_setenv and locked.
>>>>> I'm running a version of fw_setenv which includes
>>>>> https://gitlab.denx.de/u-boot/u-boot/-/commit/db820159 so it 
>>>>> shouldn't
>>>>> be locking things unnecessarily.
>>>>>>> If been unsuccessful in producing a setup for bisecting the 
>>>>>>> issue. I do
>>>>>>> know the issue doesn't occur on the old 4.4.x based kernel but 
>>>>>>> that's
>>>>>>> probably not much help.
>>>>>>>
>>>>>>> Any pointers on what the issue (and/or solution) might be.
>>>> I finally managed to get our board running with a vanilla kernel. With
>>>> corenet64_smp_defconfig I occasionally see
>>>>
>>>>     fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>
>>>> other than the message things seem to be working.
>>>>
>>>> With a custom defconfig I see
>>>>
>>>>     fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>     fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't 
>>>> empty!
>>>>     fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>>>     ...
>>>>
>>>> and access to the spi-nor does not work until the board is reset.
>>>>
>>>> I'll try and pick apart the differences between the two defconfigs.
>>
>> I now think my earlier testing is invalid. I have seen the problem 
>> with either defconfig if I try hard enough. I had convinced myself 
>> that the problem was CONFIG_PREEMPT but that was before I found 
>> boot-to-boot differences with the same kernel.
>>
>> It's possible that I'm chasing multiple issues with the same symptom.
>>
>> The error I'm most concerned with is in the sequence
>> 1. boot with old image
>> 2. write environment
>> 3. boot with new image
>> 4. write environment
>> 5. write fails and environment is corrupted
>>
>> After I recover the system things sometimes seem fine. Until I repeat 
>> the sequence above.
>>
>>> Also relevant may be:
>>> - Which dts are you using?
>> Custom but based heavily on the t2080rdb.
>>> - What's the spi-nor type, and at which frequency are you operating it?
>> The board has several alternate parts for the spi-nor so the dts just 
>> specifies compatible = "jedec,spi-nor" the actual chip detected on 
>> the board I have is "n25q032a (4096 Kbytes)". The dts sets 
>> spi-max-frequency = <10000000> I haven't measured the actual 
>> frequency on the bus.
>>> - Does the issue still happen if you lower the frequency?
>> I did play around with the frequency initially but I should probably 
>> give that another go now that I have a better reproduction method.
>
> Playing around with the frequency didn't help.
>
> One thing that I've found is that the problem appears to be that I end 
> up with extra bytes in the RX FIFO. If I add code to drain the RX FIFO 
> then the system is able to keep accessing the spi-nor (albeit with 
> some noisy logs).

I've been staring at spi-fsl-espi.c for while now and I think I've 
identified a couple of deficiencies that may or may not be related to my 
issue.

First I think the 'Transfer done but SPIE_DON isn't set' message can be 
generated spuriously. In fsl_espi_irq() we read the ESPI_SPIE register. 
We also write back to it to clear the current events. We re-read it in 
fsl_espi_cpu_irq() and complain when SPIE_DON is not set. But we can 
naturally end up in that situation if we're doing a large read. Consider 
the messages for reading a block of data from a spi-nor chip

  tx = READ_OP + ADDR
  rx = data

We setup the transfer and pump out the tx_buf. The first interrupt goes 
off and ESPI_SPIE has SPIM_DON and SPIM_RXT set. We empty the rx fifo, 
clear ESPI_SPIE and wait for the next interrupt. The next interrupt 
fires and this time we have ESPI_SPIE with just SPIM_RXT set. This 
continues until we've received all the data and we finish with ESPI_SPIE 
having only SPIM_RXT set. When we re-read it we complain that SPIE_DON 
isn't set.

The other deficiency is that we only get an interrupt when the amount of 
data in the rx fifo is above FSL_ESPI_RXTHR. If there are fewer than 
FSL_ESPI_RXTHR left to be received we will never pull them out of the fifo.

I think the reason I'm seeing some variability is because of how fast 
(or slow) the interrupts get processed and how fast the spi-nor chip can 
fill the CPUs rx fifo.

Powered by blists - more mailing lists