[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <19b4ad5f9909446ea0eca93f9b5b4c40@AcuMS.aculab.com>
Date: Fri, 25 Mar 2022 21:40:20 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Johannes Berg' <johannes@...solutions.net>,
Linus Torvalds <torvalds@...ux-foundation.org>
CC: Maxime Bizon <mbizon@...ebox.fr>,
Toke Høiland-Jørgensen <toke@...e.dk>,
Robin Murphy <robin.murphy@....com>,
Christoph Hellwig <hch@....de>,
Oleksandr Natalenko <oleksandr@...alenko.name>,
Halil Pasic <pasic@...ux.ibm.com>,
"Marek Szyprowski" <m.szyprowski@...sung.com>,
Kalle Valo <kvalo@...nel.org>,
"David S. Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
"Paolo Abeni" <pabeni@...hat.com>,
Olha Cherevyk <olha.cherevyk@...il.com>,
iommu <iommu@...ts.linux-foundation.org>,
linux-wireless <linux-wireless@...r.kernel.org>,
Netdev <netdev@...r.kernel.org>,
"Linux Kernel Mailing List" <linux-kernel@...r.kernel.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
stable <stable@...r.kernel.org>
Subject: RE: [REGRESSION] Recent swiotlb DMA_FROM_DEVICE fixes break
ath9k-based AP
I've been thinking of the case where a descriptor ring has
to be in non-coherent memory (eg because that is all there is).
The receive ring processing isn't actually that difficult.
The driver has to fill a cache line full of new buffer
descriptors in memory but without assigning the first
buffer to the hardware.
Then it has to do a cache line write of just that line.
Then it can assign ownership of the first buffer and
finally do a second cache line write.
(The first explicit write can be skipped if the cache
writes are known to be atomic.)
It then must not dirty that cache line.
To check for new frames it must invalidate the cache
line that contains the 'next to be filled' descriptor
and then read that cache line.
This will contain info about one or more receive frames.
But the hardware is still doing updates.
But both these operations can be happening at the same
time on different parts of the buffer.
So you need to know a 'cache line size' for the mapping
and be able to do writebacks and invalidates for parts
of the buffer, not just all of it.
The transmit side is harder.
It either requires waiting for all pending transmits to
finish or splitting a single transmit into enough fragments
that its descriptors end on a cache line boundary.
But again, and if the interface is busy, you want the cpu
to be able to update one cache line of transmit descriptors
while the device is writing transmit completion status
to the previous cache line.
I don't think that is materially different for non-coherent
memory or bounce buffers.
But partial flush/invalidate is needed.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Powered by blists - more mailing lists