netdev - Re: [PATCH net-next] net: introduce a config option to tweak MAX_SKB

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5cb9703b-7a6c-7f75-b8d8-38095537e8dc@gmail.com>
Date:   Tue, 21 Mar 2023 09:13:59 +0200
From:   Tariq Toukan <ttoukan.linux@...il.com>
To:     Eric Dumazet <eric.dumazet@...il.com>,
        "David S . Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>
Cc:     netdev <netdev@...r.kernel.org>,
        Eric Dumazet <edumazet@...gle.com>,
        Tariq Toukan <tariqt@...dia.com>
Subject: Re: [PATCH net-next] net: introduce a config option to tweak
 MAX_SKB_FRAGS



On 21/03/2023 5:37, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@...gle.com>
> 
> Currently, MAX_SKB_FRAGS value is 17.
> 
> For standard tcp sendmsg() traffic, no big deal because tcp_sendmsg()
> attempts order-3 allocations, stuffing 32768 bytes per frag.
> 
> But with zero copy, we use order-0 pages.
> 
> For BIG TCP to show its full potential, we add a config option
> to be able to fit up to 45 segments per skb.
> 
> This is also needed for BIG TCP rx zerocopy, as zerocopy currently
> does not support skbs with frag list.
> 
> We have used MAX_SKB_FRAGS=45 value for years [1] at Google before
> we deployed 4K MTU, with no adverse effect, other than
> a recent issue in mlx4, fixed in commit 26782aad00cc
> ("net/mlx4: MLX4_TX_BOUNCE_BUFFER_SIZE depends on MAX_SKB_FRAGS")
> 
> [1] Back then, goal was to be able to receive full size (64KB) GRO
> packets without the frag_list overhead.
> 
> By default we keep the old/legacy value of 17 until we get
> more coverage for the updated values.
> 
> Sizes of struct skb_shared_info on 64bit arches:
> 
> MAX_SKB_FRAGS | sizeof(struct skb_shared_info)
> ==============================================
>           17     320
>           21     320+64  = 384
>           25     320+128 = 448
>           29     320+192 = 512
>           33     320+256 = 576
>           37     320+320 = 640
>           41     320+384 = 704
>           45     320+448 = 768
> 
> This inflation might cause problems for drivers assuming they could pack
> both the incoming packet and skb_shared_info in half a page, using build_skb().
> 
> Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> ---
>   include/linux/skbuff.h | 14 ++------------
>   net/Kconfig            | 12 ++++++++++++
>   2 files changed, 14 insertions(+), 12 deletions(-)
> 
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index fe661011644b8f468ff5e92075a6624f0557584c..43726ca7d20f232461a4d2e5b984032806e9c13e 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -345,18 +345,8 @@ struct sk_buff_head {
>   
>   struct sk_buff;
>   
> -/* To allow 64K frame to be packed as single skb without frag_list we
> - * require 64K/PAGE_SIZE pages plus 1 additional page to allow for
> - * buffers which do not start on a page boundary.
> - *
> - * Since GRO uses frags we allocate at least 16 regardless of page
> - * size.
> - */
> -#if (65536/PAGE_SIZE + 1) < 16
> -#define MAX_SKB_FRAGS 16UL

Default value now changes for this case.
Shouldn't we preserve it?

> -#else
> -#define MAX_SKB_FRAGS (65536/PAGE_SIZE + 1)
> -#endif
> +#define MAX_SKB_FRAGS CONFIG_MAX_SKB_FRAGS
> +
>   extern int sysctl_max_skb_frags;
>   
>   /* Set skb_shinfo(skb)->gso_size to this in case you want skb_segment to
> diff --git a/net/Kconfig b/net/Kconfig
> index 48c33c2221999e575c83a409ab773b9cc3656eab..f806722bccf450c62e07bfdb245e5195ac4a156d 100644
> --- a/net/Kconfig
> +++ b/net/Kconfig
> @@ -251,6 +251,18 @@ config PCPU_DEV_REFCNT
>   	  network device refcount are using per cpu variables if this option is set.
>   	  This can be forced to N to detect underflows (with a performance drop).
>   
> +config MAX_SKB_FRAGS
> +	int "Maximum number of fragments per skb_shared_info"
> +	range 17 45
> +	default 17
> +	help
> +	  Having more fragments per skb_shared_info can help GRO efficiency.
> +	  This helps BIG TCP workloads, but might expose bugs in some
> +	  legacy drivers.
> +	  This also increases memory overhead of small packets,
> +	  and in drivers using build_skb().
> +	  If unsure, say 17.
> +
>   config RPS
>   	bool
>   	depends on SMP && SYSFS