[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5cb9703b-7a6c-7f75-b8d8-38095537e8dc@gmail.com>
Date: Tue, 21 Mar 2023 09:13:59 +0200
From: Tariq Toukan <ttoukan.linux@...il.com>
To: Eric Dumazet <eric.dumazet@...il.com>,
"David S . Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>
Cc: netdev <netdev@...r.kernel.org>,
Eric Dumazet <edumazet@...gle.com>,
Tariq Toukan <tariqt@...dia.com>
Subject: Re: [PATCH net-next] net: introduce a config option to tweak
MAX_SKB_FRAGS
On 21/03/2023 5:37, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@...gle.com>
>
> Currently, MAX_SKB_FRAGS value is 17.
>
> For standard tcp sendmsg() traffic, no big deal because tcp_sendmsg()
> attempts order-3 allocations, stuffing 32768 bytes per frag.
>
> But with zero copy, we use order-0 pages.
>
> For BIG TCP to show its full potential, we add a config option
> to be able to fit up to 45 segments per skb.
>
> This is also needed for BIG TCP rx zerocopy, as zerocopy currently
> does not support skbs with frag list.
>
> We have used MAX_SKB_FRAGS=45 value for years [1] at Google before
> we deployed 4K MTU, with no adverse effect, other than
> a recent issue in mlx4, fixed in commit 26782aad00cc
> ("net/mlx4: MLX4_TX_BOUNCE_BUFFER_SIZE depends on MAX_SKB_FRAGS")
>
> [1] Back then, goal was to be able to receive full size (64KB) GRO
> packets without the frag_list overhead.
>
> By default we keep the old/legacy value of 17 until we get
> more coverage for the updated values.
>
> Sizes of struct skb_shared_info on 64bit arches:
>
> MAX_SKB_FRAGS | sizeof(struct skb_shared_info)
> ==============================================
> 17 320
> 21 320+64 = 384
> 25 320+128 = 448
> 29 320+192 = 512
> 33 320+256 = 576
> 37 320+320 = 640
> 41 320+384 = 704
> 45 320+448 = 768
>
> This inflation might cause problems for drivers assuming they could pack
> both the incoming packet and skb_shared_info in half a page, using build_skb().
>
> Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> ---
> include/linux/skbuff.h | 14 ++------------
> net/Kconfig | 12 ++++++++++++
> 2 files changed, 14 insertions(+), 12 deletions(-)
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index fe661011644b8f468ff5e92075a6624f0557584c..43726ca7d20f232461a4d2e5b984032806e9c13e 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -345,18 +345,8 @@ struct sk_buff_head {
>
> struct sk_buff;
>
> -/* To allow 64K frame to be packed as single skb without frag_list we
> - * require 64K/PAGE_SIZE pages plus 1 additional page to allow for
> - * buffers which do not start on a page boundary.
> - *
> - * Since GRO uses frags we allocate at least 16 regardless of page
> - * size.
> - */
> -#if (65536/PAGE_SIZE + 1) < 16
> -#define MAX_SKB_FRAGS 16UL
Default value now changes for this case.
Shouldn't we preserve it?
> -#else
> -#define MAX_SKB_FRAGS (65536/PAGE_SIZE + 1)
> -#endif
> +#define MAX_SKB_FRAGS CONFIG_MAX_SKB_FRAGS
> +
> extern int sysctl_max_skb_frags;
>
> /* Set skb_shinfo(skb)->gso_size to this in case you want skb_segment to
> diff --git a/net/Kconfig b/net/Kconfig
> index 48c33c2221999e575c83a409ab773b9cc3656eab..f806722bccf450c62e07bfdb245e5195ac4a156d 100644
> --- a/net/Kconfig
> +++ b/net/Kconfig
> @@ -251,6 +251,18 @@ config PCPU_DEV_REFCNT
> network device refcount are using per cpu variables if this option is set.
> This can be forced to N to detect underflows (with a performance drop).
>
> +config MAX_SKB_FRAGS
> + int "Maximum number of fragments per skb_shared_info"
> + range 17 45
> + default 17
> + help
> + Having more fragments per skb_shared_info can help GRO efficiency.
> + This helps BIG TCP workloads, but might expose bugs in some
> + legacy drivers.
> + This also increases memory overhead of small packets,
> + and in drivers using build_skb().
> + If unsure, say 17.
> +
> config RPS
> bool
> depends on SMP && SYSFS
Powered by blists - more mailing lists