linux-kernel - Re: [PATCH v11 2/3] fpga: microchip-spi: add Microchip MPF FPGA manager

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f1557776-36c9-083f-2101-db84ca9a9cfa@microchip.com>
Date:   Wed, 11 May 2022 11:36:33 +0000
From:   <Conor.Dooley@...rochip.com>
To:     <i.bornyakov@...rotek.ru>
CC:     <mdf@...nel.org>, <hao.wu@...el.com>, <yilun.xu@...el.com>,
        <trix@...hat.com>, <robh+dt@...nel.org>,
        <krzysztof.kozlowski+dt@...aro.org>, <linux-fpga@...r.kernel.org>,
        <devicetree@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
        <system@...rotek.ru>
Subject: Re: [PATCH v11 2/3] fpga: microchip-spi: add Microchip MPF FPGA
 manager

On 11/05/2022 09:15, Ivan Bornyakov wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> On Tue, May 10, 2022 at 12:29:54PM +0100, Conor Dooley wrote:
>> On 09/05/2022 19:56, Conor Dooley wrote:
>>> On 09/05/2022 18:16, Ivan Bornyakov wrote:
>>>> On Mon, May 09, 2022 at 11:41:18AM +0000, Conor.Dooley@...rochip.com wrote:
>>>>> Hey Ivan, one comment below.
>>>>> Thanks,
>>>>> Conor.
>>>>>
>>>>> On 07/05/2022 08:43, Ivan Bornyakov wrote:
>>>>>> ... snip ...
>>>>>> +static int mpf_read_status(struct spi_device *spi)
>>>>>> +{
>>>>>> +       u8 status, status_command = MPF_SPI_READ_STATUS;
>>>>>> +       struct spi_transfer xfer = {
>>>>>> +               .tx_buf = &status_command,
>>>>>> +               .rx_buf = &status,
>>>>>> +               .len = 1,
>>>>>> +       };
>>>>>> +       int ret = spi_sync_transfer(spi, &xfer, 1);
>>>>>> +
>>>>>> +       if ((status & MPF_STATUS_SPI_VIOLATION) ||
>>>>>> +           (status & MPF_STATUS_SPI_ERROR))
>>>>>> +               ret = -EIO;
>>>>>> +
>>>>>> +       return ret ? : status;
>>>>>> +}
>>>>>> +
>>>>>> ... snip ...
>>>>>> +
>>>>>> +static int poll_status_not_busy(struct spi_device *spi, u8 mask)
>>>>>> +{
>>>>>> +       int status, timeout = MPF_STATUS_POLL_TIMEOUT;
>>>>>> +
>>>>>> +       while (timeout--) {
>>>>>> +               status = mpf_read_status(spi);
>>>>>> +               if (status < 0 ||
>>>>>> +                   (!(status & MPF_STATUS_BUSY) && (!mask || (status & mask))))
>>>>>> +                       return status;
>>>>>> +
>>>>>> +               usleep_range(1000, 2000);
>>>>>> +       }
>>>>>> +
>>>>>> +       return -EBUSY;
>>>>>> +}
>>>>>
>>>>> Is there a reason you changed this from the snippet you sent me
>>>>> in the responses to version 8:
>>>>> static int poll_status_not_busy(struct spi_device *spi, u8 mask)
>>>>> {
>>>>>          u8 status, status_command = MPF_SPI_READ_STATUS;
>>>>>          int ret, timeout = MPF_STATUS_POLL_TIMEOUT;
>>>>>          struct spi_transfer xfer = {
>>>>>                  .tx_buf = &status_command,
>>>>>                  .rx_buf = &status,
>>>>>                  .len = 1,
>>>>>          };
>>>>>
>>>>>          while (timeout--) {
>>>>>                  ret = spi_sync_transfer(spi, &xfer, 1);
>>>>>                  if (ret < 0)
>>>>>                          return ret;
>>>>>
>>>>>                  if (!(status & MPF_STATUS_BUSY) && (!mask || (status & mask)))
>>>>>                          return status;
>>>>>
>>>>>                  usleep_range(1000, 2000);
>>>>>          }
>>>>>
>>>>>          return -EBUSY;
>>>>> }
>>>>>
>>>>> With the current version, I hit the "Failed to write bitstream
>>>>> frame" check in mpf_ops_write at random points in the transfer.
>>>>> Replacing poll_status_not_busy with the above allows it to run
>>>>> to completion.
>>>>
>>>> In my eyes they are equivalent, aren't they?
>>>>
>>>
>>> I was in a bit of a rush today & didn't have time to do proper
>>> debugging, I'll put some debug code in tomorrow and try to find
>>> exactly what is different between the two.
>>>
>>> Off the top of my head, since I don't have a board on me to test,
>>> the only difference I can see is that with the snippet you only
>>> checked if spi_sync_transfer was negative whereas now you check
>>> if it has a value at all w/ that ternary operator.
>>>
>>> But even that seems like it *shouldn't* be the problem, since ret
>>> should contain -errno or zero, right?
>>> Either way, I will do some digging tomorrow.
>>
>> I put a printk("status %x, ret %d", status, ret); into the failure
>> path of mpf_read_status() & it looks like a status 0xA is being
>> returned - error & ready? That seems like a very odd combo to be
>> getting back out of it. It shouldn't be dodgy driver/connection
>> either, b/c that's what I see if I connect my protocol analyser:
>> https://i.imgur.com/VbjgfCk.png
>>
>> That's mosi (hex), ss, sclk, mosi, miso (hex), miso in descending
>> order.
>>
>> I think what was happening was with the snippet you returned one
>> of the following: -EBUSY, ret (aka -errno) or status. Since status
>> is positive, the checks in mpf_spi_write.*() saw nothing wrong at
>> all and programming continued despite there being a problem.
>>
>> The new version fixes this by returning -EIO rather than status from
>> poll_status_not_busy().
>>
>> I wish I had a socketable PolarFire so I could investigate further,
>> but this looks like it might a be hardware issue somewhere on my
>> end?
>>
>> So ye, sorry for the noise and carry on! I'll try tofind what is to
>> blame for it.
>>
>> Thanks,
>> Conor.
>>
> 
> Hi, Conor.
> 
> I've just noticed in SPI-DirectC User Guide [1] ch. 9 SmartFusion2 and
> IGLOO2 SPI-Slave Programming Waveform Analysis, that hw status checked
> two times every time. Does MPF family also need double check hw status?
> Does adding second mpf_read_status() to poll_status_not_busy() routine
> help with your issue?

Hey Ivan,
Tried your suggestion. Previously I was failing quite consistently at
transfer 34 of 590k, and sometimes making it a further. With your
suggestion, I was making it significantly further (100k+) but still
running into some of the 0xA status.
Decided to move the double check into mpfs_read_status (see the below
diff) did not run into any the 0xA statuses.
It's worth pointing out that this is the *first* time I have seen
Flash Pro Express report that the FPGA array has been enabled after
programming!

Seems like at the very least this (hacky) diff is not harmful?
Please give it a try yourself and check that things still work for
you.

diff --git a/drivers/fpga/microchip-spi.c b/drivers/fpga/microchip-spi.c
index 63b75dff2522..183cdfc05c4a 100644
--- a/drivers/fpga/microchip-spi.c
+++ b/drivers/fpga/microchip-spi.c
@@ -47,18 +47,30 @@ struct mpf_priv {
  static int mpf_read_status(struct spi_device *spi)
  {
         u8 status, status_command = MPF_SPI_READ_STATUS;
+       u8 status_repeat;
         struct spi_transfer xfer = {
                 .tx_buf = &status_command,
                 .rx_buf = &status,
                 .len = 1,
         };
+       struct spi_transfer xfer_repeat = {
+               .tx_buf = &status_command,
+               .rx_buf = &status_repeat,
+               .len = 1,
+       };
         int ret = spi_sync_transfer(spi, &xfer, 1);
+       int ret_repeat = spi_sync_transfer(spi, &xfer_repeat, 1);
+
+       if (ret || ret_repeat)
+               return -EIO;
  
-       if ((status & MPF_STATUS_SPI_VIOLATION) ||
-           (status & MPF_STATUS_SPI_ERROR))
+       if (status != status_repeat)
+               printk("status disagreement %x %x", status, status_repeat);
+       if ((status_repeat & MPF_STATUS_SPI_VIOLATION) ||
+           (status_repeat & MPF_STATUS_SPI_ERROR))
                 ret = -EIO;
  
-       return ret ? : status;
+       return ret ?: status_repeat;
  }
  
  static enum fpga_mgr_states mpf_ops_state(struct fpga_manager *mgr)