netdev - Re: [PATCH net-next v6 04/18] mm: Make the page_frag

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <399350.1681341160@warthog.procyon.org.uk>
Date:   Thu, 13 Apr 2023 00:12:40 +0100
From:   David Howells <dhowells@...hat.com>
To:     Christoph Hellwig <hch@...radead.org>
Cc:     dhowells@...hat.com, "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Willem de Bruijn <willemdebruijn.kernel@...il.com>,
        David Ahern <dsahern@...nel.org>,
        Matthew Wilcox <willy@...radead.org>,
        Al Viro <viro@...iv.linux.org.uk>,
        Jens Axboe <axboe@...nel.dk>, Jeff Layton <jlayton@...nel.org>,
        Christian Brauner <brauner@...nel.org>,
        Chuck Lever III <chuck.lever@...cle.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Jeroen de Borst <jeroendb@...gle.com>,
        Catherine Sullivan <csully@...gle.com>,
        Shailend Chand <shailend@...gle.com>,
        Felix Fietkau <nbd@....name>, John Crispin <john@...ozen.org>,
        Sean Wang <sean.wang@...iatek.com>,
        Mark Lee <Mark-MC.Lee@...iatek.com>,
        Lorenzo Bianconi <lorenzo@...nel.org>,
        Matthias Brugger <matthias.bgg@...il.com>,
        AngeloGioacchino Del Regno 
        <angelogioacchino.delregno@...labora.com>,
        Keith Busch <kbusch@...nel.org>, Jens Axboe <axboe@...com>,
        Christoph Hellwig <hch@....de>,
        Sagi Grimberg <sagi@...mberg.me>,
        Chaitanya Kulkarni <kch@...dia.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        netdev@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
        linux-mediatek@...ts.infradead.org, linux-nvme@...ts.infradead.org,
        linux-fsdevel@...r.kernel.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH net-next v6 04/18] mm: Make the page_frag_cache allocator use per-cpu

Christoph Hellwig <hch@...radead.org> wrote:

> On Tue, Apr 11, 2023 at 05:08:48PM +0100, David Howells wrote:
> > Make the page_frag_cache allocator have a separate allocation bucket for
> > each cpu to avoid racing.  This means that no lock is required, other than
> > preempt disablement, to allocate from it, though if a softirq wants to
> > access it, then softirq disablement will need to be added.
> ...
> Let me ask a third time as I've not got an answer the last two times:

Sorry about that.  I think the problem is that the copy of the message from
you directly to me arrives after the first copy that comes via a mailing list
and google then deletes the direct one - as obviously no one could possibly
want duplicates, right? :-/ - and so you usually get consigned to the
linux-kernel or linux-fsdevel mailing list folder.

> > Make the NVMe, mediatek and GVE drivers pass in NULL to page_frag_cache()
> > and use the default allocation buckets rather than defining their own.
> 
> why are these callers treated different from the others?

There are only four users of struct page_frag_cache, the one these patches
modify::

 (1) GVE.
 (2) Mediatek.
 (3) NVMe.
 (4) skbuff.

Note that things are slightly confused by there being three very similarly
named frag allocators (page_frag and page_frag_1k in addition to
page_frag_cache) and the __page_frag_cache_drain() function gets used for
things other than just page_frag_cache.

I've replaced the single allocation buckets with per-cpu allocation buckets
for (1), (2) and (3) so that no locking[*] is required other than pinning it
to the cpu temporarily - but I can't test them as I don't have hardware.

[*] Note that what's upstream doesn't have locking, and I'm not sure all the
    users of it are SMP-safe.

That leaves (4).

Upstream, skbuff.c creates two separate per-cpu frag caches and I've elected
to retain that, except that the per-cpu bits are now inside the frag allocator
as I'm not entirely sure of the reason that there's a separate napi frag cache
to the netdev_alloc_cache.

The general page_frag_cache allocator is used by skb_splice_from_iter() if it
encounters a page it can't take a ref on, so it has been tested through that
using sunrpc, sunrpc+siw and cifs+siw.

> Can you show any performance numbers?

As far as I can tell, it doesn't make any obvious difference to directly
pumping data through TCP or TLS over TCP or transferring data over a network
filesystem such as sunrpc or cifs using siw/TCP.  I've tested this between two
machines over a 1G and a 10G link.

I can generate some actual numbers tomorrow.

Actually, I probably can drop these patches 2-4 from this patchset and just
use the netdev_alloc_cache in skb_splice_from_iter() for now.  Since that
copies unspliceable data, I no longer need to allocate frags in the next layer
up.

David