[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wjJp5xCx0CCrLCzFGZyyABYSNNNa0i=4fN3fBydP7r97w@mail.gmail.com>
Date: Fri, 25 Mar 2022 14:56:50 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Johannes Berg <johannes@...solutions.net>
Cc: Maxime Bizon <mbizon@...ebox.fr>,
Toke Høiland-Jørgensen <toke@...e.dk>,
Robin Murphy <robin.murphy@....com>,
Christoph Hellwig <hch@....de>,
Oleksandr Natalenko <oleksandr@...alenko.name>,
Halil Pasic <pasic@...ux.ibm.com>,
Marek Szyprowski <m.szyprowski@...sung.com>,
Kalle Valo <kvalo@...nel.org>,
"David S. Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>,
Olha Cherevyk <olha.cherevyk@...il.com>,
iommu <iommu@...ts.linux-foundation.org>,
linux-wireless <linux-wireless@...r.kernel.org>,
Netdev <netdev@...r.kernel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
stable <stable@...r.kernel.org>
Subject: Re: [REGRESSION] Recent swiotlb DMA_FROM_DEVICE fixes break
ath9k-based AP
On Fri, Mar 25, 2022 at 2:13 PM Johannes Berg <johannes@...solutions.net> wrote:
>
> Well I see now that you said 'cache "writeback"' in (1), and 'flush' in
> (2), so perhaps you were thinking of the same, and I'm just calling it
> "flush" and "invalidate" respectively?
Yeah, so I mentally tend to think of the operations as just
"writeback" (which doesn't invalidate) and "flush" (which is a
writeback-invalidate).
Which explains my word-usage, but isn't great, because it's not
well-defined. And I'm not even consistent about it.
> However, this seems to be the wrong spot to flush (writeback?) the
> cache, as we're trying to get data from the device, not potentially
> overwrite the data that the device wrote because we have a dirty
> cacheline. Hmm. Then again, how could we possibly have a dirty
> cacheline?
So in my model, (1) is the only case where there's actively dirty data
that needs to be written back. That's the "CPU wrote the data to
memory, and wants to transfer it to the device" case.
In (2) and (3), the only question is whether possibly clean cachelines
contain - or will contain - stale data.
And then exactly when you actually invalidate is up to you.
For example, in
(2) The CPU now wants to see any state written by the device
you have multiple options. You could invalidate any stale cache lines.
Or you could say "We wrote them back and invalidated them in (1), so
we don't need to invalidate them now".
And in
(3) The CPU looked at the data while it was in flight and is now done with it.
you can (again) decide to do nothing at all, BUT ONLY IF (2) chose
that "invalidate" option. Because if you made your (2) depend on that
"they were already invalidated", then (3) has to invalidate the CPU
caches so that a subsequent (2) will work right.
So these are all somewhat interconnected.
You can do just "writeback" in (1), but then you _have_ to do
"invalidate" in (2), and in that case you don't have to do anything at
all in (3).
Or maybe your CPU only has that "writeback-and-invalidate" operation,
so you decide that (2) should be a no-op, and (3) - which is
presumably less common than (2) - also does that writeback-invalidate
thing.
Or we can also say that (3) is not allowed at all - so the ath9k case
is actively wrong and we should warn about that case - but that again
constrains what you can do in (2) and now that previous optimization
is not valid.
And it's worth noting that if your CPU may do cache fills as a result
of speculative accesses (or just sufficiently out of order), then the
whole argument that "I invalidated those lines earlier, so I don't
need to invalidate them now" is just garbage.
Fun, isn't it?
> Which starts to clarify in my mind why we have a sort of (implied)
> ownership model: if the CPU dirties a cacheline while the device has
> ownership then the cache writeback might overwrite the DMA data.
Right, I think that "if the CPU dirties the cacheline while the device
has ownership, then the data is going to be undefined".
And btw, it does actually potentially happen for real - imagine a user
mmap'ing the IO buffer while IO is in flight. The kernel can't control
for that (well, we can make things read-only, and some uses do), but
then it's often a question of "you have to dirty that area and do the
IO again, because the last attempt sent out undefined data".
And note how this "undefined data" situation can happen even with
coherent IO, so this part isn't even about cache coherency - it's
literally about just about memory accesses being in some undefined
order.
So it *can* be valid to send inconsistent data, but that should be
considered the really odd case.
> So it's
> easier to think of it as "CPU has ownership" and "device has ownership",
> but evidently that simple model breaks down in real-world cases such as
> ath9k where the CPU wants to look, but not write, and the device
> continues doing DMA at the same time.
Yeah, and see above about how the CPU could even write (but honestly,
that isn't valid in the general case, it really requires that kind of
active "we can fix it up later" thing).
> Well if you actually did dirty the cacheline, then you have a bug one
> way or the other, and it's going to be really hard to debug - either you
> lose the CPU write, or you lose the device write, there's no way you're
> not losing one of them?
Again, see above. Losing the CPU write is really bad, because then you
can't even recover by re-doing the operation.
So yes, when looking at only the "single operation" case, it looks
like "lose one or the other". But in the bigger picture, one is more
important than the other.
> So the operations
> (1) dma_sync_single_for_device(DMA_TO_DEVICE)
> (2) dma_sync_single_for_cpu(DMA_FROM_DEVICE)
> (3) dma_sync_single_for_device(DMA_FROM_DEVICE)
>
> really only (1) passes write ownership to the device, but then you can't
> get it back?
Well, you get it back by just checking that the IO is done. Once the
IO is done, the CPU owns the area again.
And the "IO is done" is often some entirely independent status in some
entirely different place.
But it *could* be something that requires a CPU read from that DMA
area. But it's a CPU _read_, so you don't need write ownership for
that.
That's why there is only one DMA_TO_DEVICE, and there are two
DMA_FROM_DEVICE cases.
The DMA_TO_DEVICE cannot have a "let me write in the middle" situation.
But the DMA_FROM_DEVICE has that "let me read in the middle, and
decide if it's done or not", so you can have a looping read, and
that's where (3) comes in.
You can't have a looping write for one operation (but you can
obviously have several independent write operations - that's what the
whole "are you done" is all about)
> But that cannot be true, because ath9k maps the buffer as
> DMA_BIDIRECTIONAL, and then eventually might want to recycle it.
See above. Both cases have "the device is done with this", but they
are fundamentally different situations.
> > That sounds really odd for
> > a "sync" operation. It sounds very reasonable for _allocating_ DMA,
> > but for syncing I'm left scratching my head what the semantics would
> > be.
>
> I agree.
>
> > But yes, if we do and people come up with semantics for it, those
> > semantics should be clearly documented.
>
> I'm not sure? I'm wondering if this isn't just because - like me
> initially - people misunderstood the direction argument, or didn't
> understand it well enough, and then just passed the same value as for
> the map()/unmap()?
Yeah, the solution may well be "grep for it, and pick the right
direction once the docs are clear".
Linus
Powered by blists - more mailing lists