[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <acec0279-f5a2-2b0c-e044-6200e7a37e37@linux.ibm.com>
Date: Fri, 18 Mar 2022 09:10:47 -0500
From: Eddie James <eajames@...ux.ibm.com>
To: Joel Stanley <joel@....id.au>
Cc: OpenBMC Maillist <openbmc@...ts.ozlabs.org>,
Mark Brown <broonie@...nel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
linux-spi@...r.kernel.org
Subject: Re: [PATCH] spi: fsi: Implement a timeout for polling status
On 3/17/22 23:19, Joel Stanley wrote:
> On Thu, 17 Mar 2022 at 21:14, Eddie James <eajames@...ux.ibm.com> wrote:
>> The data transfer routines must poll the status register to
>> determine when more data can be shifted in or out. If the hardware
>> gets into a bad state, these polling loops may never exit. Prevent
>> this by returning an error if a timeout is exceeded.
> This makes sense. We may even want to put this code in regardless.
>
> However, I'm wondering why the code in fsi_spi_status didn't catch this.
Same, which is why I thought the problem couldn't be happening here for
a long time. See below with what I think is going on.
>
>> static int fsi_spi_status(struct fsi_spi *ctx, u64 *status, const char *dir)
>> {
>> int rc = fsi_spi_read_reg(ctx, SPI_FSI_STATUS, status);
> You mentioned the error condition is we get back 0xff. That means that
> status will be 0xffff_ffff_ffff_ffff ?
>
> Did you observe status being this value?
No, I think my observation of 0xff is not universal. I suspect that
while the CFAM is IN reset, 0xff is returned, but once it's been reset,
valid (though maybe uninitialized) data is returned. I observed a status
of 0x0001100000000000, which means the controller is idle, which makes
sense since it's been reset. So the issue occurs if we start an
operation before a CFAM reset and are waiting for it to complete during
the CFAM reset, but we also don't get any failed/invalid data FSI
operations during/after the reset (very timing dependent - the FSI
master does lock during the reset but doesn't wait afterwards for the
hardware to initialize).
>
>> if (rc)
>> return rc;
>>
>> if (*status & SPI_FSI_STATUS_ANY_ERROR) {
> I think that we're checking against 0xffe0f000.
>
>> dev_err(ctx->dev, "%s error: %016llx\n", dir, *status);
>>
>> rc = fsi_spi_reset(ctx);
>> if (rc)
>> return rc;
> Is the problem here? fsi_spi_reset writes to the clock config
> registers, but doesn't read the status.
>
> Obviously doing the writes causes a call to fsi_spi_check_status, but
> that checks the FSI2SPI bridge, not the SPI master.
>
> ...but it doesn't matter, because we're either going to return an
> error from the reset, or return EREMOTEIO, so there's no masking of
> the error.
Not sure I follow. I don't think we were hitting this path in this error
scenario. Do you think we need to check the status after a reset? It
should always be the same.
>
>> return -EREMOTEIO;
>> }
>>
>> return 0;
>> }
>
>> Signed-off-by: Eddie James <eajames@...ux.ibm.com>
>> ---
>> drivers/spi/spi-fsi.c | 10 ++++++++++
>> 1 file changed, 10 insertions(+)
>>
>> diff --git a/drivers/spi/spi-fsi.c b/drivers/spi/spi-fsi.c
>> index b6c7467f0b59..d403a7a3021d 100644
>> --- a/drivers/spi/spi-fsi.c
>> +++ b/drivers/spi/spi-fsi.c
>> @@ -25,6 +25,7 @@
>>
>> #define SPI_FSI_BASE 0x70000
>> #define SPI_FSI_INIT_TIMEOUT_MS 1000
>> +#define SPI_FSI_STATUS_TIMEOUT_MS 100
> Can you add a comment (or put something in the commit message) about
> why you chose 100ms.
Hm, sure, but I chose it pretty arbitrarily. I'm not sure how to choose
something like this.
>
>> #define SPI_FSI_MAX_RX_SIZE 8
>> #define SPI_FSI_MAX_TX_SIZE 40
>>
>> @@ -299,6 +300,7 @@ static int fsi_spi_transfer_data(struct fsi_spi *ctx,
>> struct spi_transfer *transfer)
>> {
>> int rc = 0;
>> + unsigned long end;
>> u64 status = 0ULL;
>>
>> if (transfer->tx_buf) {
>> @@ -315,10 +317,14 @@ static int fsi_spi_transfer_data(struct fsi_spi *ctx,
>> if (rc)
>> return rc;
>>
>> + end = jiffies + msecs_to_jiffies(SPI_FSI_STATUS_TIMEOUT_MS);
>> do {
>> rc = fsi_spi_status(ctx, &status, "TX");
>> if (rc)
>> return rc;
>> +
>> + if (time_after(jiffies, end))
>> + return -ETIMEDOUT;
>> } while (status & SPI_FSI_STATUS_TDR_FULL);
>>
>> sent += nb;
>> @@ -329,10 +335,14 @@ static int fsi_spi_transfer_data(struct fsi_spi *ctx,
>> u8 *rx = transfer->rx_buf;
>>
>> while (transfer->len > recv) {
>> + end = jiffies + msecs_to_jiffies(SPI_FSI_STATUS_TIMEOUT_MS);
>> do {
>> rc = fsi_spi_status(ctx, &status, "RX");
>> if (rc)
>> return rc;
>> +
>> + if (time_after(jiffies, end))
>> + return -ETIMEDOUT;
>> } while (!(status & SPI_FSI_STATUS_RDR_FULL));
>>
>> rc = fsi_spi_read_reg(ctx, SPI_FSI_DATA_RX, &in);
>> --
>> 2.27.0
>>
Powered by blists - more mailing lists