lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 3 Feb 2016 07:58:01 -0800
From:	Alexander Duyck <alexander.duyck@...il.com>
To:	Hans Westgaard Ry <hans.westgaard.ry@...cle.com>
Cc:	"David S. Miller" <davem@...emloft.net>,
	Alexey Kuznetsov <kuznet@....inr.ac.ru>,
	James Morris <jmorris@...ei.org>,
	Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
	Patrick McHardy <kaber@...sh.net>,
	Tom Herbert <tom@...bertland.com>,
	Pablo Neira Ayuso <pablo@...filter.org>,
	Eric Dumazet <edumazet@...gle.com>,
	Florian Westphal <fw@...len.de>, Jiri Pirko <jiri@...nulli.us>,
	Alexander Duyck <alexander.h.duyck@...hat.com>,
	Michal Hocko <mhocko@...e.com>,
	Linus Lüssing <linus.luessing@...3.blue>,
	Hannes Frederic Sowa <hannes@...essinduktion.org>,
	Herbert Xu <herbert@...dor.apana.org.au>,
	Tejun Heo <tj@...nel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Alexey Kodanev <alexey.kodanev@...cle.com>,
	Håkon Bugge <haakon.bugge@...cle.com>,
	open list <linux-kernel@...r.kernel.org>,
	"open list:NETWORKING [GENERAL]" <netdev@...r.kernel.org>
Subject: Re: [PATCH v3] net:Add sysctl_max_skb_frags

On Wed, Feb 3, 2016 at 12:26 AM, Hans Westgaard Ry
<hans.westgaard.ry@...cle.com> wrote:
> Devices may have limits on the number of fragments in an skb they support.
> Current codebase uses a constant as maximum for number of fragments one
> skb can hold and use.
> When enabling scatter/gather and running traffic with many small messages
> the codebase uses the maximum number of fragments and may thereby violate
> the max for certain devices.
> The patch introduces a global variable as max number of fragments.
>
> Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@...cle.com>
> Reviewed-by: Håkon Bugge <haakon.bugge@...cle.com>
>
> ---
>  include/linux/skbuff.h     |  1 +
>  net/core/skbuff.c          |  2 ++
>  net/core/sysctl_net_core.c | 10 ++++++++++
>  net/ipv4/tcp.c             |  4 ++--
>  4 files changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 4355129..fe47ad3 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -219,6 +219,7 @@ struct sk_buff;
>  #else
>  #define MAX_SKB_FRAGS (65536/PAGE_SIZE + 1)
>  #endif
> +extern int sysctl_max_skb_frags;
>
>  typedef struct skb_frag_struct skb_frag_t;
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 152b9c7..c336b97 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -79,6 +79,8 @@
>
>  struct kmem_cache *skbuff_head_cache __read_mostly;
>  static struct kmem_cache *skbuff_fclone_cache __read_mostly;
> +int sysctl_max_skb_frags __read_mostly = MAX_SKB_FRAGS;
> +EXPORT_SYMBOL(sysctl_max_skb_frags);
>
>  /**
>   *     skb_panic - private function for out-of-line support
> diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
> index 95b6139..a6beb7b 100644
> --- a/net/core/sysctl_net_core.c
> +++ b/net/core/sysctl_net_core.c

I really don't think these changes belong in the core. Below you only
modify the TCP code path so this more likely belongs in the TCP path
unless you are going to guarantee that all other code paths obey the
sysctl.  It probably belongs in net/ipv4/sysctl_net_ipv4.c

> @@ -26,6 +26,7 @@ static int zero = 0;
>  static int one = 1;
>  static int min_sndbuf = SOCK_MIN_SNDBUF;
>  static int min_rcvbuf = SOCK_MIN_RCVBUF;
> +static int max_skb_frags = MAX_SKB_FRAGS;
>
>  static int net_msg_warn;       /* Unused, but still a sysctl */
>
> @@ -392,6 +393,15 @@ static struct ctl_table net_core_table[] = {
>                 .mode           = 0644,
>                 .proc_handler   = proc_dointvec
>         },
> +       {
> +               .procname       = "max_skb_frags",
> +               .data           = &sysctl_max_skb_frags,
> +               .maxlen         = sizeof(int),
> +               .mode           = 0644,
> +               .proc_handler   = proc_dointvec_minmax,
> +               .extra1         = &one,
> +               .extra2         = &max_skb_frags,
> +       },
>         { }
>  };

I'm not really a fan of this name either.  Maybe it should be
something like tcp_max_gso_frags.

> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index c82cca1..3dc7a2fd 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -938,7 +938,7 @@ new_segment:
>
>                 i = skb_shinfo(skb)->nr_frags;
>                 can_coalesce = skb_can_coalesce(skb, i, page, offset);
> -               if (!can_coalesce && i >= MAX_SKB_FRAGS) {
> +               if (!can_coalesce && i >= sysctl_max_skb_frags) {
>                         tcp_mark_push(tp, skb);
>                         goto new_segment;
>                 }
> @@ -1211,7 +1211,7 @@ new_segment:
>
>                         if (!skb_can_coalesce(skb, i, pfrag->page,
>                                               pfrag->offset)) {
> -                               if (i == MAX_SKB_FRAGS || !sg) {
> +                               if (i == sysctl_max_skb_frags || !sg) {
>                                         tcp_mark_push(tp, skb);
>                                         goto new_segment;
>                                 }

This bit looks good.

I was wondering.  Have you considered looking at something like what
was done with gso_max_size?  It seems like it is meant to address a
problem similar to what you have described where the NICs only support
a certain layout for the GSO frame.  Though now that I look over the
code it seems like it might be flawed in that I don't see bridges or
tunnels really respecting the value so it seems like they could cause
issues.

- Alex

Powered by blists - more mailing lists