netdev - Re: [PATCH net] xsk: remove cheap

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <88d27e1b-dbda-301c-64ba-2391092e3236@intel.com>
Date:   Sun, 28 Jun 2020 19:16:33 +0200
From:   Björn Töpel <bjorn.topel@...el.com>
To:     Christoph Hellwig <hch@....de>,
        Daniel Borkmann <daniel@...earbox.net>
Cc:     Björn Töpel <bjorn.topel@...il.com>,
        netdev@...r.kernel.org, davem@...emloft.net,
        konrad.wilk@...cle.com, iommu@...ts.linux-foundation.org,
        linux-kernel@...r.kernel.org, bpf@...r.kernel.org,
        maximmi@...lanox.com, magnus.karlsson@...el.com,
        jonathan.lemon@...il.com
Subject: Re: [PATCH net] xsk: remove cheap_dma optimization

On 2020-06-27 09:04, Christoph Hellwig wrote:
> On Sat, Jun 27, 2020 at 01:00:19AM +0200, Daniel Borkmann wrote:
>> Given there is roughly a ~5 weeks window at max where this removal could
>> still be applied in the worst case, could we come up with a fix / proposal
>> first that moves this into the DMA mapping core? If there is something that
>> can be agreed upon by all parties, then we could avoid re-adding the 9%
>> slowdown. :/
> 
> I'd rather turn it upside down - this abuse of the internals blocks work
> that has basically just missed the previous window and I'm not going
> to wait weeks to sort out the API misuse.  But we can add optimizations
> back later if we find a sane way.
>

I'm not super excited about the performance loss, but I do get
Christoph's frustration about gutting the DMA API making it harder for
DMA people to get work done. Lets try to solve this properly using
proper DMA APIs.

> That being said I really can't see how this would make so much of a
> difference.  What architecture and what dma_ops are you using for
> those measurements?  What is the workload?
> 

The 9% is for an AF_XDP (Fast raw Ethernet socket. Think AF_PACKET, but 
faster.) benchmark: receive the packet from the NIC, and drop it. The 
DMA syncs stand out in the perf top:

   28.63%  [kernel]                   [k] i40e_clean_rx_irq_zc
   17.12%  [kernel]                   [k] xp_alloc
    8.80%  [kernel]                   [k] __xsk_rcv_zc
    7.69%  [kernel]                   [k] xdp_do_redirect
    5.35%  bpf_prog_992d9ddc835e5629  [k] bpf_prog_992d9ddc835e5629
    4.77%  [kernel]                   [k] xsk_rcv.part.0
    4.07%  [kernel]                   [k] __xsk_map_redirect
    3.80%  [kernel]                   [k] dma_direct_sync_single_for_cpu
    3.03%  [kernel]                   [k] dma_direct_sync_single_for_device
    2.76%  [kernel]                   [k] i40e_alloc_rx_buffers_zc
    1.83%  [kernel]                   [k] xsk_flush
...

For this benchmark the dma_ops are NULL (dma_is_direct() == true), and
the main issue is that SWIOTLB is now unconditionally enabled [1] for
x86, and for each sync we have to check that if is_swiotlb_buffer()
which involves a some costly indirection.

That was pretty much what my hack avoided. Instead we did all the checks
upfront, since AF_XDP has long-term DMA mappings, and just set a flag
for that.

Avoiding the whole "is this address swiotlb" in
dma_direct_sync_single_for_{cpu, device]() per-packet
would help a lot.

Somewhat related to the DMA API; It would have performance benefits for
AF_XDP if the DMA range of the mapped memory was linear, i.e. by IOMMU
utilization. I've started hacking a thing a little bit, but it would be
nice if such API was part of the mapping core.

Input: array of pages Output: array of dma addrs (and obviously dev,
flags and such)

For non-IOMMU len(array of pages) == len(array of dma addrs)
For best-case IOMMU len(array of dma addrs) == 1 (large linear space)

But that's for later. :-)

Björn

[1] commit: 09230cbc1bab ("swiotlb: move the SWIOTLB config symbol to 
lib/Kconfig")