[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8eedf34e-b870-4a73-b966-e9745809dff3@linaro.org>
Date: Tue, 1 Jul 2025 11:02:18 +0100
From: James Clark <james.clark@...aro.org>
To: Vladimir Oltean <vladimir.oltean@....com>
Cc: Vladimir Oltean <olteanv@...il.com>, Mark Brown <broonie@...nel.org>,
Arnd Bergmann <arnd@...db.de>, Larisa Grigore <larisa.grigore@....com>,
Frank Li <Frank.li@....com>, Christoph Hellwig <hch@....de>,
linux-spi@...r.kernel.org, imx@...ts.linux.dev, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 2/6] spi: spi-fsl-dspi: Store status directly in
cur_msg->status
On 30/06/2025 9:41 pm, Vladimir Oltean wrote:
> On Mon, Jun 30, 2025 at 01:54:11PM +0100, James Clark wrote:
>> On 27/06/2025 10:30 pm, Vladimir Oltean wrote:
>>> On Fri, Jun 27, 2025 at 11:21:38AM +0100, James Clark wrote:
>>>> This will allow us to return a status from the interrupt handler in a
>>>> later commit and avoids copying it at the end of
>>>> dspi_transfer_one_message(). For consistency make polling and DMA modes
>>>> use the same mechanism.
>>>>
>>>> Refactor dspi_rxtx() and dspi_poll() to not return -EINPROGRESS because
>>>> this isn't actually a status that was ever returned to the core layer
>>>> but some internal state. Wherever that was used we can look at dspi->len
>>>> instead.
>>>>
>>>> No functional changes intended.
>>>>
>>>> Signed-off-by: James Clark <james.clark@...aro.org>
>>>> ---
>>>
>>> This commit doesn't work, please do not merge this patch.
>>>
>>> You are changing the logic in DMA mode, interrupt-based FIFO and PIO all
>>> in one go, in a commit whose title and primary purpose is unrelated to
>>> that. Just a mention of the type "while at it, also do that". And in
>>> that process, that bundled refactoring introduces a subtle, but severe bug.
>>>
>>> No, that is discouraged. Make one patch per logical change, where only
>>> one thing is happening and which is obviously correct. It helps you and
>>> it helps the reviewer.
>>>
>>> Please find attached a set of 3 patches that represent a broken down and
>>> corrected variant of this one. First 2 should be squashed together in
>>> your next submission, they are just to illustrate the bug that you've
>>> introduced (which can be reproduced on any SoC in XSPI mode).
>>>
>>
>> Thanks for the debugging, yes it looks like the patches could be broken down
>> a bit.
>>
>> Just for clarity, is this bug affecting host+polling mode? I can see the
>> logic bug in dspi_poll() which I must have tested less thoroughly, but I
>> can't actually see any difference in dspi_interrupt().
>
> It should affect both, I tested your patches unmodified, i.e. interrupt
> based XSPI FIFO mode (in master mode).
>
> Assume (not real numbers, just for explanation's sake) dspi->len is 2
> (2 FIFO sizes worth of 32-bit words, but let's assume for simplicity
> that each dspi_pop_tx() call simply decrements the len by 1).
>
> The correct behavior would be this:
>
> dspi_transfer_one_message()
> -> dspi->len = 2
> -> dspi_fifo_write()
> -> dspi_xspi_fifo_write()
> -> dspi_pop_tx()
> -> dspi->len = 1
> -> wait_for_completion(&dspi->xfer_done)
> <IRQ>
> dspi_interrupt()
> -> dspi_rxtx()
> -> dspi_fifo_read()
> -> dspi_fifo_write()
> -> dspi_xspi_fifo_write()
> -> dspi_pop_tx()
> -> dspi->len = 0
> <IRQ>
> dspi_interrupt()
> -> dspi_rxtx()
> -> dspi_fifo_read()
> -> complete(&dspi->xfer_done)
> -> reinit_completion(&dspi->xfer_done)
>
> but the behavior with your proposed logic is this:
>
> dspi_transfer_one_message()
> -> dspi->len = 2
> -> dspi_fifo_write()
> -> dspi_xspi_fifo_write()
> -> dspi_pop_tx()
> -> dspi->len = 1
> -> wait_for_completion(&dspi->xfer_done)
> <IRQ>
> dspi_interrupt()
> -> dspi_rxtx()
> -> dspi_fifo_read()
> -> dspi_fifo_write()
> -> dspi_xspi_fifo_write()
> -> dspi_pop_tx()
> -> dspi->len = 0
> -> complete(&dspi->xfer_done)
> -> reinit_completion(&dspi->xfer_done)
> <IRQ>
> dspi_interrupt()
> -> Second interrupt is spurious at
> this point, since the process
> context may have proceeded
> to change pointers in
> dspi->cur_transfer, etc.
>
> Clearer now? Essentially the complete() call is premature, it needs to
> be not after the dspi_fifo_write() call, but after its subsequent
> dspi_fifo_read(), which comes after yet another IRQ, in the IRQ-triggered
> path.
>
Much clearer, thanks. Not sure how I missed that, maybe a confusion
about whether it was dspi_fifo_read() or dspi_fifo_write() that modifies
dspi->len.
> Not sure why you are not able to reproduce this, maybe luck had it that
> the complete() call never woke up the process context earlier than the
> second IRQ in the above case triggered?
>
> I'm not doing anything special in particular, just booted a board with a
> SPI device driver (sja1105). This transfers some sequences of relatively
> large buffers (256 bytes) at probe time, maybe that exercises the
> controller driver more than the average peripheral driver.
It's strange because I was stressing it quite a lot, especially with the
performance testing.
Powered by blists - more mailing lists