[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wippum+MksdY7ixMfa3i1sZ+nxYPWLLpVMNyXCgmiHbBQ@mail.gmail.com>
Date: Fri, 25 Mar 2022 11:30:21 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Maxime Bizon <mbizon@...ebox.fr>
Cc: Toke Høiland-Jørgensen <toke@...e.dk>,
Robin Murphy <robin.murphy@....com>,
Christoph Hellwig <hch@....de>,
Oleksandr Natalenko <oleksandr@...alenko.name>,
Halil Pasic <pasic@...ux.ibm.com>,
Marek Szyprowski <m.szyprowski@...sung.com>,
Kalle Valo <kvalo@...nel.org>,
"David S. Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>,
Olha Cherevyk <olha.cherevyk@...il.com>,
iommu <iommu@...ts.linux-foundation.org>,
linux-wireless <linux-wireless@...r.kernel.org>,
Netdev <netdev@...r.kernel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
stable <stable@...r.kernel.org>
Subject: Re: [REGRESSION] Recent swiotlb DMA_FROM_DEVICE fixes break
ath9k-based AP
On Fri, Mar 25, 2022 at 3:25 AM Maxime Bizon <mbizon@...ebox.fr> wrote:
>
> In the non-cache-coherent scenario, and assuming dma_map() did an
> initial cache invalidation, you can write this:
.. but the problem is that the dma mapping code is supposed to just
work, and the driver isn't supposed to know or care whether dma is
coherent or not, or using bounce buffers or not.
And currently it doesn't work.
Because what that ath9k driver does is "natural", but it's wrong for
the bounce buffer case.
And I think the problem is squarely on the dma-mapping side for two reasons:
(a) this used to work, now it doesn't, and it's unclear how many
other drivers are affected
(b) the dma-mapping naming and calling conventions are horrible and
actively misleading
That (a) is a big deal. The reason the ath9k issue was found quickly
is very likely *NOT* because ath9k is the only thing affected. No,
it's because ath9k is relatively common.
Just grep for dma_sync_single_for_device() and ask yourself: how many
of those other drivers have you ever even HEARD of, much less be able
to test?
And that's just one "dma_sync" function. Admittedly it's likely one of
the more common ones, but still..
Now, (b) is why I think driver nufgt get this so wrong - or, in this
case, possibly the dma-mapping code itself.
The naming - and even the documentation(!!!) - implies that what ath9k
does IS THE RIGHT THING TO DO.
The documentation clearly states:
"Before giving the memory to the device, dma_sync_single_for_device() needs
to be called, and before reading memory written by the device,
dma_sync_single_for_cpu(), just like for streaming DMA mappings that are
reused"
and ath9k obviously did exactly that, even with a comment to the effect.
And I think ath9k is actually right here, but the documentation is so
odd and weak that it's the dma-mapping code that was buggy.
So the dma mapping layer literally broke the documented behavior, and
then Christoph goes and says (in another email in this discussion):
"Unless I'm misunderstanding this thread we found the bug in ath9k
and have a fix for that now?"
which I think is a gross mis-characterization of the whole issue, and
ignores *BOTH* of (a) and (b).
So what's the move forward here?
I personally think we need to
- revert commit aa6f8dcbab47 for the simple reason that it is known
to break one driver. But it is unknown how many other drivers are
affected.
Even if you think aa6f8dcbab47 was the right thing to do (and I
don't - see later), the fact is that it's new behavior that the dma
bounce buffer code hasn't done in the past, and clearly confuses
things.
- think very carefully about the ath9k case.
We have a patch that fixes it for the bounce buffer case, but you
seem to imply that it might actually break non-coherent cases:
"So I'd be very cautious assuming sync_for_cpu() and sync_for_device()
are both doing invalidation in existing implementation of arch DMA ops,
implementers may have taken some liberty around DMA-API to avoid
unnecessary cache operation (not to blame them)"
so who knows what other dma situations it might break?
Because if some non-coherent mapping infrastructure assumes that
*only* sync_for_device() will actually flush-and-invalidate caches
(because the platform thinks that once they are flushed, getting them
back to the CPU doesn't need any special ops), then you're right:
Toke's ath9k patch will just result in cache coherency issues on those
platforms instead.
- think even *more* about what the ath9k situation means for the dma
mapping naming and documentation.
I basically think the DMA syncing has at least three cases (and a
fourth combination that makes no sense):
(1) The CPU has actively written to memory, and wants to give that
data to the device.
This is "dma_sync_single_for_device(DMA_TO_DEVICE)".
A cache-coherent thing needs to do nothing.
A non-coherent thing needs to do a cache "writeback" (and probably
will flush)
A bounce buffer implementation needs to copy *to* the bounce buffer
(2) The CPU now wants to see any state written by the device since
the last sync
This is "dma_sync_single_for_cpu(DMA_FROM_DEVICE)".
A bounce-buffer implementation needs to copy *from* the bounce buffer.
A cache-coherent implementation needs to do nothing.
A non-coherent implementation maybe needs to do nothing (ie it
assumes that previous ops have flushed the cache, and just accessing
the data will bring the rigth thing back into it). Or it could just
flush the cache.
(3) The CPU has seen the state, but wants to leave it to the device
This is "dma_sync_single_for_device(DMA_FROM_DEVICE)".
A bounce buffer implementation needs to NOT DO ANYTHING (this is
the current ath9k bug - copying to the bounce buffer is wrong)
A cache coherent implementation needs to do nothing
A non-coherent implementation needs to flush the cache again, bot
not necessarily do a writeback-flush if there is some cheaper form
(assuming it does nothing in the "CPU now wants to see any state" case
because it depends on the data not having been in the caches)
(4) There is a fourth case: dma_sync_single_for_cpu(DMA_TO_DEVICE)
which maybe should generate a warning because it seems to make no
sense? I can't think of a case where this would be an issue - the data
is specifically for the device, but it's synced "for the CPU"?
Do people agree? Or am I missing something?
But I don't think the documentation lays out these cases, and I don't
think the naming is great.
I also don't think that we can *change* the naming. That's water under
the bridge. It is what it is. So I think people need to really agree
on the semantics (did I get them entirely wrong above?) and try to
think about ways to maybe give warnings for things that make no sense.
Based on my suggested understanding of what the DMA layer should do,
the ath9k code is actually doing exactly the right thing. It is doing
dma_sync_single_for_device(DMA_FROM_DEVICE);
and based on my four cases above, the bounce buffer code must do
nothing, because "for_device()" together with "FROM_DEVICE" clearly
says that all the data is coming *from* the device, and copying any
bounce buffers is wrong.
In other words, I think commit aa6f8dcbab47 ("swiotlb: rework 'fix
info leak with DMA_FROM_DEVICE'") is fundamentally wrong. It doesn't
just break ath9k, it fundamentally break that "case 3" above. It's
doing a DMA_TO_DEVICE copy, even though it was a DMA_FROM_DEVICE sync.
So I really think that "revert aa6f8dcbab47" is not only inevitable
because of practical worries about what it breaks, but because that
commit was just entirely and utterly WRONG.
But having written this long wall of text, I'm now slightly worried
that I'm just confused, and am trying to just convince myself.
So please: can people think about this a bit more, and try to shoot
down the above argument and show that I'm just being silly?
And if I'm right, can we please document this and try really hard to
come up with some sanity checks (perhaps based on some "dma buffer
state" debug code?)
Linus
Powered by blists - more mailing lists