netdev - Re: [REGRESSION] Recent swiotlb DMA_FROM_DEVICE fixes break ath9k-based AP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220326003853.44c3285c.pasic@linux.ibm.com>
Date:   Sat, 26 Mar 2022 00:38:53 +0100
From:   Halil Pasic <pasic@...ux.ibm.com>
To:     Robin Murphy <robin.murphy@....com>
Cc:     mbizon@...ebox.fr, Linus Torvalds <torvalds@...ux-foundation.org>,
        Toke Høiland-Jørgensen <toke@...e.dk>,
        Netdev <netdev@...r.kernel.org>, Kalle Valo <kvalo@...nel.org>,
        linux-wireless <linux-wireless@...r.kernel.org>,
        Oleksandr Natalenko <oleksandr@...alenko.name>,
        stable <stable@...r.kernel.org>,
        "David S. Miller" <davem@...emloft.net>,
        iommu <iommu@...ts.linux-foundation.org>,
        Olha Cherevyk <olha.cherevyk@...il.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Christoph Hellwig <hch@....de>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Halil Pasic <pasic@...ux.ibm.com>
Subject: Re: [REGRESSION] Recent swiotlb DMA_FROM_DEVICE fixes break
 ath9k-based AP

On Fri, 25 Mar 2022 11:27:41 +0000
Robin Murphy <robin.murphy@....com> wrote:

> What muddies the waters a bit is that the opposite combination 
> sync_for_cpu(DMA_TO_DEVICE) really *should* always be a no-op, and I for 
> one have already made the case for eliding that in code elsewhere, but 
> it doesn't necessarily hold for the inverse here, hence why I'm not sure 
> there even is a robust common solution for peeking at a live 
> DMA_FROM_DEVICE buffer.

In https://lkml.org/lkml/2022/3/24/739 I also argued, that a robust
common solution for a peeking at a live DMA_FROM_DEVICE buffer is
probably not possible, at least not with the current programming model
as described by Documentation/core-api/dma-api.rst.

Namely AFAIU the programming model is based on exclusive ownership: the
buffer is either owned by the device, which means CPU(s) are not allowed
to *access* it, or it is owned by the CPU(s), and the device is not
allowed to *access* it. Do we agree on this?

Considering what Linus said here https://lkml.org/lkml/2022/3/24/775
I understand that: if the idea that dma_sync_*_for_{cpu,device} always
transfers ownership to the cpu and device respectively is abandoned, 
and we re-define ownership in a sense that only the owner may write,
but non-owner is allowed to read, then it may be possible to make the
scenario under discussion work. 

The scenario in pseudo code:

/* when invoked device might be doing DMA into buf */
rx_buf_complete(buf)
{
	prepare_peek(buf, DMA_FROM_DEVICE);
        if (!is_ready(buf)) {
                /*let device gain the buffer again*/
                peek_done_not_ready(buf, DMA_FROM_DEVICE);
                return false;
        }
	peek_done_ready(buf, DMA_FROM_DEVICE);
	process_buff(buf, DMA_FROM_DEVICE); is
}

IMHO it is pretty obvious, that prepare_peek() has to update the
cpu copy of the data *without* transferring ownership to the CPU. Since
the owner is still the device, it is legit for the device to keep
modifying the buffer via DMA. In case of the swiotlb, we would copy the
content of the bounce buffer to the orig buffer possibly after
invalidating
caches, and for non-swiotlb we would do invalidate caches. So
prepare_peek() could be actually something like,
dma_sync_single_for_cpu(buf, DMA_FROM_DEVICE,
                        DMA_ATTR_NO_OWNERSHIP_TRANSFER)
which would most end up being functionally the same, as without the
flag, since my guess is that the ownership is only tracked in our heads. 

For peek_done_not_ready() there is conceptually nothing to do, because
the device retained ownership. Thus would either have to mandate
peek_done_not_ready() being a nop, or non-existent, (that is
what Toke's patch does in the specific case), or we would have to
mandate that dma_sync_*_for_*() has no side effects under certain. The
former looks simpler to me, especially with swiotlb. But we are also
fine if the cache ain't dirty, because the CPU didn't write (as pointed
out by Linus) and we were to detect that, and avoid flushing a clean
cache, or if we were to track ownership and to avoid flushing caches
because no ownership transfer. But to avoid these bad flushes, at least
for swiotlb, we would either have to track cache ownership, or even
worse track dirtiness (for which we would have to extend the API, and
make the drivers tell us that the cache, i.e. the original buffer got
dirtied).

Since the device has ownership when peek_done_not_ready() is invoked,
we might need to transfer ownership to the CPU in peek_done_ready().
This could again be a dma_sync_for_cpu() with a flag, which when supplied
tells the dma API that no sync (cache invalidate) is needed because the
driver guarantees, that the whole mapping was sufficiently sync-ed by
prepare_peek(). Please notice, that the whole scheme is based on the
driver knowing that the whole DMA is done by examining the buffer, and
it decides based on whatever it sees.

Some of the ongoing discussion seem so ignore this whole ownership biz.
My feeling is: the notion of ownership useful. If both sides end up
modifying (and eventually flushing) we are in trouble IMHO, an ownership
avoids that. But if the conclusion ends up being, that ownership does
not matter, then we should make sure it is purged from the documentation,
because otherwise it will confuse the hell out of people who read
documentations and care about programming models. People like me.

Regards,
Halil