[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b8e1ef0d-20ae-0ea1-3c29-fc8db96e2afb@intel.com>
Date: Wed, 1 Jul 2020 12:17:50 +0200
From: Björn Töpel <bjorn.topel@...el.com>
To: Robin Murphy <robin.murphy@....com>,
Christoph Hellwig <hch@....de>,
Daniel Borkmann <daniel@...earbox.net>
Cc: maximmi@...lanox.com, konrad.wilk@...cle.com,
jonathan.lemon@...il.com, linux-kernel@...r.kernel.org,
iommu@...ts.linux-foundation.org, netdev@...r.kernel.org,
bpf@...r.kernel.org, davem@...emloft.net, magnus.karlsson@...el.com
Subject: Re: [PATCH net] xsk: remove cheap_dma optimization
On 2020-06-29 17:41, Robin Murphy wrote:
> On 2020-06-28 18:16, Björn Töpel wrote:
[...]>
>> Somewhat related to the DMA API; It would have performance benefits for
>> AF_XDP if the DMA range of the mapped memory was linear, i.e. by IOMMU
>> utilization. I've started hacking a thing a little bit, but it would be
>> nice if such API was part of the mapping core.
>>
>> Input: array of pages Output: array of dma addrs (and obviously dev,
>> flags and such)
>>
>> For non-IOMMU len(array of pages) == len(array of dma addrs)
>> For best-case IOMMU len(array of dma addrs) == 1 (large linear space)
>>
>> But that's for later. :-)
>
> FWIW you will typically get that behaviour from IOMMU-based
> implementations of dma_map_sg() right now, although it's not strictly
> guaranteed. If you can weather some additional setup cost of calling
> sg_alloc_table_from_pages() plus walking the list after mapping to test
> whether you did get a contiguous result, you could start taking
> advantage of it as some of the dma-buf code in DRM and v4l2 does already
> (although those cases actually treat it as a strict dependency rather
> than an optimisation).
>
> I'm inclined to agree that if we're going to see more of these cases, a
> new API call that did formally guarantee a DMA-contiguous mapping
> (either via IOMMU or bounce buffering) or failure might indeed be handy.
>
I forgot to reply to this one! My current hack is using the iommu code
directly, similar to what vfio-pci does (hopefully not gutting the API
this time ;-)).
Your approach sound much nicer, and easier. I'll try that out! Thanks a
lot for the pointers, and I might be back with more questions.
Cheers,
Björn
> Robin.
Powered by blists - more mailing lists