linux-kernel - Re: [PATCH] spi: microchip-core: prevent RX overflows when transmit size

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250115-resubmit-glove-ca6ce06c9d4f@spud>
Date: Wed, 15 Jan 2025 18:22:55 +0000
From: Conor Dooley <conor@...nel.org>
To: Mark Brown <broonie@...nel.org>
Cc: linux-spi@...r.kernel.org, Conor Dooley <conor.dooley@...rochip.com>,
	stable@...r.kernel.org,
	Daire McNamara <daire.mcnamara@...rochip.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] spi: microchip-core: prevent RX overflows when transmit
 size > FIFO size

On Tue, Jan 14, 2025 at 07:44:26PM +0000, Mark Brown wrote:
> On Tue, Jan 14, 2025 at 05:13:49PM +0000, Conor Dooley wrote:
> 
> > When the size of a transfer exceeds the size of the FIFO (32 bytes), RX
> > overflows will be generated and receive data will be corrupted and
> > warnings will be produced. For example, here's an error generated by a
> > transfer of 36 bytes:
> 
> >   spi_master spi0: mchp_corespi_interrupt: RX OVERFLOW: rxlen: 4, txlen: 0
> 
> > I am not entirely sure how this happens, as rxlen being 4 means that 32
> > of 36 bytes have been read from the RX FIFO so there should be
> > sufficient room for 4 more bytes but timing is likely a factor as simply
> > adding a delay in the transmit path is enough to avoid the overflows.
> 
> The reads from the FIFO happen prior to the check for overflow in the
> interrupt handler so if we overflow then take the interrupt we will read
> the full 32 byte FIFO before it sees that there was an overflow.
> 
> > @@ -221,6 +221,13 @@ static inline void mchp_corespi_write_fifo(struct mchp_corespi *spi)
> >  	while ((i < fifo_max) && !(mchp_corespi_read(spi, REG_STATUS) & STATUS_TXFIFO_FULL)) {
> >  		u32 word;
> >  
> > +		/*
> > +		 * If the transfer is larger than FIFO_DEPTH, spin until space
> > +		 * is made in the RX FIFO to avoid losing data to RX overflows
> > +		 */
> > +		while (mchp_corespi_read(spi, REG_STATUS) & STATUS_RXFIFO_FULL)
> > +			;
> > +
> 
> So, this is the transmit side but we're polling the RX FIFO and not
> doing anything to clear it?  I see that the FIFO reads are driven from
> interrupt context.  If I had to guess I'd say that there's latency in
> the interrupt being delivered (possibly to a different CPU) and when the
> transfer is being driven by the TX side it's stuffing data into the TX
> FIFO faster than interrupts are being delivered (the TX side just seems
> to busy wait on there being space in the FIFO which can't be good for
> slower speeds...) so the TX and RX sides of the transfer get out of sync.
> 
> Given that AFAICT the controller has to RX all the time I suspect you
> want to move the RX processing out of interrupt context into the main
> _transfer_one() function and busy wait for that too, or push the TX side
> into the interrupt handler (which at first glance looks simpler).
> Either way the two directions will be driven from the same place and so
> not get out of sync.

Hmm, at first glance it is indeed simpler, but it may be a bit of a
poor compromise. There used to be calls to
mchp_corespi_write_fifo() in the interrupt handler, but I removed them
in the ?summer? because messages were being sent out of order. IIRC what
happens is that the tx fifo can be emptied by the controller hardware
before the whole message has been written into it, thereby generating
the interrupts and triggering a parallel call to mchp_corespi_write_fifo().

To avoid that, the first byte could be sent by transfer_one(), and then
everything else handled by the current implementation of
mchp_corespi_write_fifo(), but I think that would cause "actual" RX
overflows, without cutting down to sending (FIFO_MAX - 1) bytes per call
or ensuring that the rx fifo is read before the tx one. Given the timing
issues there seem to be around the rx ready/tx done interrupts and the
rx/tx fifo status bits being set/cleared, I don't want to play games
there. That means cutting down to something well under FIFO_MAX (which
is 32 bytes), probably to something like 1/2/4 bytes depending on
transfer size since that can be done in a single write on the bus and
will allow the same code to be used in transfer_one() and in the
interrupt handler.

Moving it out of the interrupt handler entirely probably makes the code
a lot easier to reason about, so I think I'd actually rather do that.
That said, I don't think I would want that change going into fixes as it's
a pretty intrusive change, although given where we are in the cycle it
ain't as much of a problem since I'd not be sending it until after 6.13
is released anyway.

Cheers,
Conor.

Download attachment "signature.asc" of type "application/pgp-signature" (229 bytes)