netdev - Re: [REGRESSION] Recent swiotlb DMA_FROM_DEVICE fixes break ath9k-based AP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e42e4c8bf35b62c671ec20ec6c21a43216e7daa6.camel@sipsolutions.net>
Date:   Fri, 25 Mar 2022 22:13:08 +0100
From:   Johannes Berg <johannes@...solutions.net>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Maxime Bizon <mbizon@...ebox.fr>,
        Toke Høiland-Jørgensen <toke@...e.dk>,
        Robin Murphy <robin.murphy@....com>,
        Christoph Hellwig <hch@....de>,
        Oleksandr Natalenko <oleksandr@...alenko.name>,
        Halil Pasic <pasic@...ux.ibm.com>,
        Marek Szyprowski <m.szyprowski@...sung.com>,
        Kalle Valo <kvalo@...nel.org>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Olha Cherevyk <olha.cherevyk@...il.com>,
        iommu <iommu@...ts.linux-foundation.org>,
        linux-wireless <linux-wireless@...r.kernel.org>,
        Netdev <netdev@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        stable <stable@...r.kernel.org>
Subject: Re: [REGRESSION] Recent swiotlb DMA_FROM_DEVICE fixes break
 ath9k-based AP

On Fri, 2022-03-25 at 13:47 -0700, Linus Torvalds wrote:
> On Fri, Mar 25, 2022 at 1:38 PM Johannes Berg <johannes@...solutions.net> wrote:
> > 
> > >  (2) The CPU now wants to see any state written by the device since
> > > the last sync
> > > 
> > >     This is "dma_sync_single_for_cpu(DMA_FROM_DEVICE)".
> > > 
> > >     A bounce-buffer implementation needs to copy *from* the bounce buffer.
> > > 
> > >     A cache-coherent implementation needs to do nothing.
> > > 
> > >     A non-coherent implementation maybe needs to do nothing (ie it
> > > assumes that previous ops have flushed the cache, and just accessing
> > > the data will bring the rigth thing back into it). Or it could just
> > > flush the cache.
> > 
> > Doesn't that just need to *invalidate* the cache, rather than *flush*
> > it?
> 
> Yes.  I should have been more careful.

Well I see now that you said 'cache "writeback"' in (1), and 'flush' in
(2), so perhaps you were thinking of the same, and I'm just calling it
"flush" and "invalidate" respectively?

> That said, I think "invalidate without writeback" is a really
> dangerous operation (it can generate some *really* hard to debug
> memory state), so on the whole I think you should always strive to
> just do "flush-and-invalidate".

Hmm. Yeah, can't really disagree with that.

However, this seems to be the wrong spot to flush (writeback?) the
cache, as we're trying to get data from the device, not potentially
overwrite the data that the device wrote because we have a dirty
cacheline. Hmm. Then again, how could we possibly have a dirty
cacheline?


Which starts to clarify in my mind why we have a sort of (implied)
ownership model: if the CPU dirties a cacheline while the device has
ownership then the cache writeback might overwrite the DMA data. So it's
easier to think of it as "CPU has ownership" and "device has ownership",
but evidently that simple model breaks down in real-world cases such as
ath9k where the CPU wants to look, but not write, and the device
continues doing DMA at the same time.


Now in that case the cache wouldn't be considered dirty either since the
CPU was just reading, but who knows? Hence the suggestion of just
invalidate, not flush.

> If the core has support for "invalidate clean cache lines only", then
> that's possibly a good alternative.

Well if you actually did dirty the cacheline, then you have a bug one
way or the other, and it's going to be really hard to debug - either you
lose the CPU write, or you lose the device write, there's no way you're
not losing one of them?


ath9k doesn't write, of course, so hopefully the core wouldn't write
back what it must think of as clean cachelines, even if the device
modified the memory underneath already.


So really while I agree with your semantics, I was previously privately
suggesting to Toke we should probably have something like

dma_single_cpu_peek()
// read buffer and check if it was done
dma_single_cpu_peek_finish()

which really is equivalent to the current

dma_sync_single_for_cpu(DMA_FROM_DEVICE)
// ...
dma_sync_single_for_device(DMA_FROM_DEVICE)

that ath9k does, but makes it clearer that you really can't write to the
buffer... but, water under the bridge, I guess.

Thinking about the ownership model again - it seems that we need to at
least modify that ownership model in the sense that we have *write*
ownership that we pass around, not just "ownership". But if we do that,
then we need to clarify which operations pass write ownership and which
don't.

So the operations
(1) dma_sync_single_for_device(DMA_TO_DEVICE)
(2) dma_sync_single_for_cpu(DMA_FROM_DEVICE)
(3) dma_sync_single_for_device(DMA_FROM_DEVICE)

really only (1) passes write ownership to the device, but then you can't
get it back?

But that cannot be true, because ath9k maps the buffer as
DMA_BIDIRECTIONAL, and then eventually might want to recycle it.

Actually though, perhaps passing write ownership back to the CPU isn't
an operation that the DMA API needs to worry about - if the CPU has read
ownership and the driver knows separately that the device is no longer
accessing it, then basically the CPU already got write ownership, and
passes that back uses (1)?


> > Then, however, we need to define what happens if you pass
> > DMA_BIDIRECTIONAL to the sync_for_cpu() and sync_for_device() functions,
> > which adds two more cases? Or maybe we eventually just think that's not
> > valid at all, since you have to specify how you're (currently?) using
> > the buffer, which can't be DMA_BIDIRECTIONAL?
> 
> Ugh. Do we actually have cases that do it?

Yes, a few.

> That sounds really odd for
> a "sync" operation. It sounds very reasonable for _allocating_ DMA,
> but for syncing I'm left scratching my head what the semantics would
> be.

I agree.

> But yes, if we do and people come up with semantics for it, those
> semantics should be clearly documented.

I'm not sure? I'm wondering if this isn't just because - like me
initially - people misunderstood the direction argument, or didn't
understand it well enough, and then just passed the same value as for
the map()/unmap()?

You have to pass the size to all of them too, after all ... but I'm
speculating.

> And if we don't - or people can't come up with semantics for it - we
> should actively warn about it and not have some code that does odd
> things that we don't know what they mean.

Makes sense.

> But it sounds like you agree with my analysis, just not with some of
> my bad/incorrect word choices.

Yes.

johannes