linux-kernel - Re: [PATCH 1/4] spi: spi-fsl-dspi: Clear completion counter before initiating transfer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250610210147.kwuwwjtcl36hrxjc@skbuf>
Date: Wed, 11 Jun 2025 00:01:47 +0300
From: Vladimir Oltean <vladimir.oltean@....com>
To: James Clark <james.clark@...aro.org>
Cc: Vladimir Oltean <olteanv@...il.com>, Mark Brown <broonie@...nel.org>,
	linux-spi@...r.kernel.org, imx@...ts.linux.dev,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/4] spi: spi-fsl-dspi: Clear completion counter before
 initiating transfer

On Tue, Jun 10, 2025 at 04:41:04PM +0100, James Clark wrote:
> On 10/06/2025 12:34 pm, Vladimir Oltean wrote:
> > On Mon, Jun 09, 2025 at 04:32:38PM +0100, James Clark wrote:
> > > In target mode, extra interrupts can be received between the end of a
> > > transfer and halting the module if the host continues sending more data.
> > 
> > Presumably you mean not just any extra interrupts can be received, but
> > specifically CMDTCF, since that triggers the complete(&dspi->xfer_done)
> > call. Other interrupt sources are masked in XSPI mode and should be
> > irrelevant.
> 
> Yes complete(&dspi->xfer_done) is called so CMDTCF is set. For example in
> one case of underflow I get SPI_SR = 0xca8b0450, which is these flags:
> 
>   TCF, TXRXS, TFUF, TFFF, CMDTCF, RFOF, RFDF, CMDFFF
> 
> Compared to a successful transfer I get 0xc2830330:
> 
>   TCF, TXRXS,       TFFF, CMDTCF,       RFDF, CMDFFF

Ok, so my new question would be: if CMDTCF is set, presumably it means a
command was transferred. What command was transferred, and who put data
in the FIFO for it?

Because the answer to the above is AFAIU "no one", I guess the driver
should ignore CMDTCF when TFUF (TX FIFO underflow) is set; I consider
that to be the logic bug. You are also doing that in patch 4/4, except
you still call complete() for some reason. If you don't call complete(),
there is no reason to fend against spurious completions.

I think I would prefer seeing more deliberate decisions in the driver,
it helps if things don't just work by coincidence.

> > > If the interrupt from this occurs after the reinit_completion() then the
> > > completion counter is left at a non-zero value. The next unrelated
> > > transfer initiated by userspace will then complete immediately without
> > > waiting for the interrupt or writing to the RX buffer.
> > > 
> > > Fix it by resetting the counter before the transfer so that lingering
> > > values are cleared. This is done after clearing the FIFOs and the
> > > status register but before the transfer is initiated, so no interrupts
> > > should be received at this point resulting in other race conditions.
> > 
> > Sorry, I don't have a lot of experience with the target mode, and when I
> > introduced the XSPI FIFO mode, I didn't take target mode into consideration.
> > 
> > The question is, does the module support XSPI FIFO writes in target
> > mode? In the LS1028A reference manual, I see PUSHR_SLAVE has the upper
> > 16 bits (for the command) hidden, specifically there is no CTAS field
> > there that would point to one of the CTARE0/CTARE1 registers.
> > Cross-checking with the S32G3 RM, I see nothing fundamentally different.
> > 
> > I am surprised, given this fact, that the CMDTCF interrupt would fire at
> > all in target mode.
> 
> It's working in my testing where I've forced it to XSPI mode instead of DMA
> mode on S32G3. I assume the command is blank because in target mode CTAR0
> (aka CTAR0_SLAVE) is always used regardless of the frame.
> 
> CTARE0 isn't explicitly relabeled like CTAR0, but this paragraph states that
> CTARE0 is used:
> 
>   50.4.3.2 Slave mode
> 
>   ... The SPI Slave mode transfer attributes are configured in the CTAR0
>   and CTARE0 registers ...

That's an interesting piece of data which I wasn't aware of, thanks.

> Any transfers smaller than the FIFO are working in interrupt mode, although
> larger ones are problematic because there isn't enough time to reload the
> FIFOs while the host is still sending (hence the error I added in patch 4).
> 
> Polling mode isn't working at all because it has a timeout which gets hit
> and returns -ETIMEDOUT before the host sends anything. Although I added the
> check there for consistency and for catching host mode errors.
> 
> > > 
> > > Fixes: 4f5ee75ea171 ("spi: spi-fsl-dspi: Replace interruptible wait queue with a simple completion")
> > 
> > To be clear, if you ran 'git bisect' to track down this issue, it
> > wouldn't have pointed you to this commit, would it?
> 
> I didn't test it no, but I did assume that the wake_up_interruptible() that
> got replaced wasn't vulnerable to this same issue. Because the spurious
> wake_up_interruptible() would be "lost", and a fresh one from the next
> transfer would have been required to proceed past the
> wait_event_interruptible().
> 
> Whereas wait_for_completion() is just a counter so it has the memory problem
> explained in the commit message.

Why would a spurious wake_up_interruptible() be lost? Is it because of
the dspi->waitflags condition not becoming 1? It would also become 1...