netdev - Re: [RFC PATCH] net: ip_finish_output_gso: Attempt gso

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160825120533.352bbd1b@pixies>
Date:   Thu, 25 Aug 2016 12:05:33 +0300
From:   Shmulik Ladkani <shmulik.ladkani@...il.com>
To:     Florian Westphal <fw@...len.de>,
        "David S. Miller" <davem@...emloft.net>,
        Hannes Frederic Sowa <hannes@...essinduktion.org>,
        Eric Dumazet <edumazet@...gle.com>,
        Herbert Xu <herbert@...dor.apana.org.au>
Cc:     netdev@...r.kernel.org
Subject: Re: [RFC PATCH] net: ip_finish_output_gso: Attempt gso_size
 clamping if segments exceed mtu

Hi,

On Mon, 22 Aug 2016 14:58:42 +0200, fw@...len.de wrote:
> Shmulik Ladkani <shmulik.ladkani@...il.com> wrote:
> > There are cases where gso skbs (which originate from an ingress
> > interface) have a gso_size value that exceeds the output dst mtu:
> > 
> >  - ipv4 forwarding middlebox having in/out interfaces with different mtus
> >    addressed by fe6cc55f3a 'net: ip, ipv6: handle gso skbs in forwarding path'
> >  - bridge having a tunnel member interface stacked over a device with small mtu
> >    addressed by b8247f095e 'net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for local udp tunneled skbs'
> > 
> > In both cases, such skbs are identified, then go through early software
> > segmentation+fragmentation as part of ip_finish_output_gso.
> > 
> > Another approach is to shrink the gso_size to a value suitable so
> > resulting segments are smaller than dst mtu, as suggeted by Eric
> > Dumazet (as part of [1]) and Florian Westphal (as part of [2]).
> > 
> > This will void the need for software segmentation/fragmentation at
> > ip_finish_output_gso, thus significantly improve throughput and lower
> > cpu load.
> > 
> > This RFC patch attempts to implement this gso_size clamping.
> > 
> > [1] https://patchwork.ozlabs.org/patch/314327/
> > [2] https://patchwork.ozlabs.org/patch/644724/
> > 
> > Signed-off-by: Shmulik Ladkani <shmulik.ladkani@...il.com>
> > ---
> > 
> >  Florian, in fe6cc55f you described a BUG due to gso_size decrease.
> >  I've tested both bridged and routed cases, but in my setups failed to
> >  hit the issue; Appreciate if you can provide some hints.
> 
> Still get the BUG, I applied this patch on top of net-next.

The BUG occurs when GRO occurs on the ingress, and only if GRO merges
skbs into the frag_list (OTOH when segments are only placed into frags[]
of a single skb, skb_segment succeeds even if gso_size was altered).

This is due to an assumption that the frag_list members terminate on
exact MSS boundaries (which no longer holds during gso_size clamping).

The assumption is dated back to original support of 'frag_list' to
skb_segment:

    89319d3801 net: Add frag_list support to skb_segment

    This patch adds limited support for handling frag_list packets in
    skb_segment.  The intention is to support GRO (Generic Receive Offload)
    packets which will be constructed by chaining normal packets using
    frag_list.

    As such we require all frag_list members terminate on exact MSS
    boundaries.  This is checked using BUG_ON.

    As there should only be one producer in the kernel of such packets,
    namely GRO, this requirement should not be difficult to maintain.

We have few alternatives for gso_size clamping:

1 Fix 'skb_segment' arithmentics to support inputs that do not match
  the "frag_list members terminate on exact MSS" assumption.

2 Perform gso_size clamping in 'ip_finish_output_gso' for non-GROed skbs.
  Other usecases will still benefit: (a) packets arriving from
  virtualized interfaces, e.g. tap and friends; (b) packets arriving from
  veth of other namespaces (packets are locally generated by TCP stack
  on a different netns).

3 Ditch the idea, again ;)

We can persue (1), especially if there are other benefits doing so.
OTOH due to the current complexity of 'skb_segment' this is bit risky.

Going with (2) could be reasonable too, as it brings value for
the virtualized environmnets that need gso_size clamping, while
presenting minimal risk.

Appreciate your opinions.

Regards,
Shmulik