netdev - Re: [PATCH net-next 2/3] net: skbuff: cache one skb

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <1de97f3a-ae56-8cdf-4677-ceb36bdc336d@intel.com>
Date:   Thu, 16 Feb 2023 13:04:37 +0100
From:   Alexander Lobakin <aleksander.lobakin@...el.com>
To:     Jakub Kicinski <kuba@...nel.org>
CC:     Edward Cree <ecree.xilinx@...il.com>, <davem@...emloft.net>,
        <netdev@...r.kernel.org>, <edumazet@...gle.com>,
        <pabeni@...hat.com>, <willemb@...gle.com>, <fw@...len.de>
Subject: Re: [PATCH net-next 2/3] net: skbuff: cache one skb_ext for use by
 GRO

From: Jakub Kicinski <kuba@...nel.org>
Date: Wed, 15 Feb 2023 10:20:15 -0800

> On Wed, 15 Feb 2023 19:01:19 +0100 Alexander Lobakin wrote:
>>> I was hoping to leave sizing of the cache until we have some data from
>>> a production network (or at least representative packet traces).
>>>
>>> NAPI_SKB_CACHE_SIZE kinda assumes we're not doing much GRO, right?  
>>
>> It assumes we GRO a lot :D
>>
>> Imagine that you have 64 frames during one poll and the GRO layer
>> decides to coalesce them by batches of 16. Then only 4 skbs will be
>> used, the rest will go as frags (with "stolen heads") -> 60 of 64 skbs
>> will return to that skb cache and will then be reused by napi_build_skb().
> 
> Let's say 5 - for 4 resulting skbs GRO will need the 4 resulting and
> one extra to shuttle between the driver and GRO (worst case).
> With a cache of 1 I'm guaranteed to save 59 alloc calls, 92%, right?
> 
> That's why I'm saying - the larger cache would help workloads which
> don't GRO as much. Am I missing the point or how GRO works?

Maybe I'm missing something now :D

The driver receives 5 frames, so it allocates 5 skbs. GRO coalesces them
into one big, so the first one remains as an skb, the following 4 get
their data added as frags and then are moved to the NAPI cache
(%NAPI_GRO_FREE_STOLEN_HEAD).
After GRO decides it's enough for this skb, it gets moved to the pending
list to be flushed soon. @gro_normal_batch is usually 8, so it means
there can be up to 8....
Oh wait, Eric changed this to count segments, not skbs :D
...there can be up to 2* such skbs waiting for a flush (the first one
sets the counter to 5, the second adds 5 more => flush happens). So you
anyway would need at least 2* skb extensions cached, otherwise there
will be new allocations.
This is not counting fraglists, when GRO decides to fraglist an skb, it
requires at least 1 skb more. UDP fraglisted GRO (I know almost nobody
uses it, still it does exist) doesn't use frags at all and requires 1
skb per each segment.
You're right that the cache size of %NAPI_POLL_WEIGHT is needed only for
corner cases like big @gro_normal_batch, fraglists, UDP fraglisted GRO
and so on, still think we shouldn't ignore them :) Also this cache can
then be reused later to bulk-free extensions on Tx completion, just like
it's done for skbs.

* or less/more if customized by user, for example I set 16 on MIPS,
x86_64 works better with 8.

Thanks,
Olek