netdev - Re: [PATCH net] net: avoid 32 x truesize under-estimation for tiny skbs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <498a25e4f7ba4e21d688ca74f335b28cadcb3381.camel@redhat.com>
Date:   Thu, 08 Sep 2022 13:00:50 +0200
From:   Paolo Abeni <pabeni@...hat.com>
To:     Alexander H Duyck <alexander.duyck@...il.com>,
        Eric Dumazet <eric.dumazet@...il.com>,
        "David S . Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>
Cc:     netdev <netdev@...r.kernel.org>,
        Eric Dumazet <edumazet@...gle.com>,
        Alexander Duyck <alexanderduyck@...com>,
        "Michael S . Tsirkin" <mst@...hat.com>,
        Greg Thelen <gthelen@...gle.com>
Subject: Re: [PATCH net] net: avoid 32 x truesize under-estimation for tiny
 skbs

On Wed, 2022-09-07 at 14:36 -0700, Alexander H Duyck wrote:
> On Wed, 2022-09-07 at 22:19 +0200, Paolo Abeni wrote:
> > What outlined above will allow for 10 min size frags in page_order0, as
> > (SKB_DATA_ALIGN(0) + SKB_DATA_ALIGN(struct skb_shared_info) == 384. I'm
> > not sure that anything will allocate such small frags.
> > With a more reasonable GRO_MAX_HEAD, there will be 6 frags per page. 
> 
> That doesn't account for any headroom though. 

Yes, the 0-size data packet was just a theoretical example to make the
really worst case scenario.

> Most of the time you have
> to reserve some space for headroom so that if this buffer ends up
> getting routed off somewhere to be tunneled there is room for adding to
> the header. I think the default ends up being NET_SKB_PAD, though many
> NICs use larger values. So adding any data onto that will push you up
> to a minimum of 512 per skb for the first 64B for header data.
> 
> With that said it would probably put you in the range of 8 or fewer
> skbs per page assuming at least 1 byte for data:
>   512 =	SKB_DATA_ALIGN(NET_SKB_PAD + 1) +
> 	SKB_DATA_ALIGN(struct skb_shared_info)

In most build GRO_MAX_HEAD packets are even larger (should be 640)

> > The maximum truesize underestimation in both cases will be lower than
> > what we can get with the current code in the worst case (almost 32x
> > AFAICS). 
> > 
> > Is the above schema safe enough or should the requested size
> > artificially inflatted to fit at most 4 allocations per page_order0?
> > Am I miss something else? Apart from omitting a good deal of testing in
> > the above list ;) 
> 
> If we are working with an order 0 page we may just want to split it up
> into a fixed 1K fragments and not bother with a variable pagecnt bias.
> Doing that we would likely simplify this quite a bit and avoid having
> to do as much page count manipulation which could get expensive if we
> are not getting many uses out of the page. An added advantage is that
> we can get rid of the pagecnt_bias and just work based on the page
> offset.
> 
> As such I am not sure the page frag cache would really be that good of
> a fit since we have quite a bit of overhead in terms of maintaining the
> pagecnt_bias which assumes the page is a bit longer lived so the ratio
> of refcnt updates vs pagecnt_bias updates is better.

I see. With the above schema there will be 4-6 frags per packet. I'm
wild guessing that the pagecnt_bias optimization still give some gain
in that case, but I really shold collect some data points.

If the pagecnt optimization should be dropped, it would be probably
more straight-forward to use/adapt 'page_frag' for the page_order0
allocator.

BTW it's quite strange/confusing having to very similar APIs (page_frag
and page_frag_cache) with very similar names and no references between
them.

Thanks!

Paolo