lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b791d25d-8417-06e5-8e8b-6a9d3195c807@intel.com>
Date:   Thu, 20 Apr 2023 18:42:17 +0200
From:   Alexander Lobakin <aleksander.lobakin@...el.com>
To:     Christoph Hellwig <hch@...radead.org>
CC:     Jakub Kicinski <kuba@...nel.org>,
        Xuan Zhuo <xuanzhuo@...ux.alibaba.com>,
        <netdev@...r.kernel.org>,
        Björn Töpel <bjorn@...nel.org>,
        Magnus Karlsson <magnus.karlsson@...el.com>,
        "Maciej Fijalkowski" <maciej.fijalkowski@...el.com>,
        Jonathan Lemon <jonathan.lemon@...il.com>,
        "David S. Miller" <davem@...emloft.net>,
        "Eric Dumazet" <edumazet@...gle.com>,
        Paolo Abeni <pabeni@...hat.com>,
        "Alexei Starovoitov" <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Jesper Dangaard Brouer <hawk@...nel.org>,
        John Fastabend <john.fastabend@...il.com>,
        <bpf@...r.kernel.org>, <virtualization@...ts.linux-foundation.org>,
        "Michael S. Tsirkin" <mst@...hat.com>,
        Guenter Roeck <linux@...ck-us.net>,
        Gerd Hoffmann <kraxel@...hat.com>,
        Jason Wang <jasowang@...hat.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Jens Axboe <axboe@...nel.dk>,
        Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH net-next] xsk: introduce xsk_dma_ops

From: Christoph Hellwig <hch@...radead.org>
Date: Thu, 20 Apr 2023 09:15:23 -0700

> On Thu, Apr 20, 2023 at 03:59:39PM +0200, Alexander Lobakin wrote:
>> Hmm, currently almost all Ethernet drivers map Rx pages once and then
>> just recycle them, keeping the original DMA mapping. Which means pages
>> can have the same first mapping for very long time, often even for the
>> lifetime of the struct device. Same for XDP sockets, the lifetime of DMA
>> mappings equals the lifetime of sockets.
>> Does it mean we'd better review that approach and try switching to
>> dma_alloc_*() family (non-coherent/caching in our case)?
> 
> Yes, exactly.  dma_alloc_noncoherent can be used exactly as alloc_pages
> + dma_map_* by the driver (including the dma_sync_* calls on reuse), but
> has a huge number of advantages.
> 
>> Also, I remember I tried to do that for one my driver, but the thing
>> that all those functions zero the whole page(s) before returning them to
>> the driver ruins the performance -- we don't need to zero buffers for
>> receiving packets and spend a ton of cycles on it (esp. in cases when 4k
>> gets zeroed each time, but your main body of traffic is 64-byte frames).
> 
> Hmm, the single zeroing when doing the initial allocation shows up
> in these profiles?

When there's no recycling of pages, then yes. And since recycling is
done asynchronously, sometimes new allocations happen either way.
Anyways, that was roughly a couple years ago right when you introduced
dma_alloc_noncoherent(). Things might've been changed since then.
I could try again while next is closed (i.e. starting this Sunday), the
only thing I'd like to mention: Page Pool allocates pages via
alloc_pages_bulk_array_node(). Bulking helps a lot (and PP uses bulks of
16 IIRC), explicit node setting helps when Rx queues are distributed
between several nodes. We can then have one struct device for several nodes.
As I can see, there's now no function to allocate in bulks and no
explicit node setting option (e.g. mlx5 works around this using
set_dev_node() + allocate + set_dev_node(orig_node)). Could such options
be added in near future? That would help a lot switching to the
functions intended for use when DMA mappings can stay for a long time.
>From what I see from the code, that shouldn't be a problem (except for
non-direct DMA cases, where we'd need to introduce new callbacks or
extend the existing ones).

Thanks,
Olek

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ