lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1471867570-1406-1-git-send-email-shmulik.ladkani@gmail.com>
Date:   Mon, 22 Aug 2016 15:06:10 +0300
From:   Shmulik Ladkani <shmulik.ladkani@...il.com>
To:     "David S. Miller" <davem@...emloft.net>
Cc:     netdev@...r.kernel.org,
        Shmulik Ladkani <shmulik.ladkani@...il.com>,
        Hannes Frederic Sowa <hannes@...essinduktion.org>,
        Eric Dumazet <edumazet@...gle.com>,
        Florian Westphal <fw@...len.de>
Subject: [RFC PATCH] net: ip_finish_output_gso: Attempt gso_size clamping if segments exceed mtu

There are cases where gso skbs (which originate from an ingress
interface) have a gso_size value that exceeds the output dst mtu:

 - ipv4 forwarding middlebox having in/out interfaces with different mtus
   addressed by fe6cc55f3a 'net: ip, ipv6: handle gso skbs in forwarding path'
 - bridge having a tunnel member interface stacked over a device with small mtu
   addressed by b8247f095e 'net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for local udp tunneled skbs'

In both cases, such skbs are identified, then go through early software
segmentation+fragmentation as part of ip_finish_output_gso.

Another approach is to shrink the gso_size to a value suitable so
resulting segments are smaller than dst mtu, as suggeted by Eric
Dumazet (as part of [1]) and Florian Westphal (as part of [2]).

This will void the need for software segmentation/fragmentation at
ip_finish_output_gso, thus significantly improve throughput and lower
cpu load.

This RFC patch attempts to implement this gso_size clamping.

[1] https://patchwork.ozlabs.org/patch/314327/
[2] https://patchwork.ozlabs.org/patch/644724/

Cc: Hannes Frederic Sowa <hannes@...essinduktion.org>
Cc: Eric Dumazet <edumazet@...gle.com>
Cc: Florian Westphal <fw@...len.de>

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@...il.com>
---

 Comments welcome.

 Few questions embedded in the patch.

 Florian, in fe6cc55f you described a BUG due to gso_size decrease.
 I've tested both bridged and routed cases, but in my setups failed to
 hit the issue; Appreciate if you can provide some hints.

 net/ipv4/ip_output.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index dde37fb..b911b43 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -216,6 +216,40 @@ static int ip_finish_output2(struct net *net, struct sock *sk, struct sk_buff *s
 	return -EINVAL;
 }
 
+static inline int skb_gso_clamp(struct sk_buff *skb, unsigned int network_len)
+{
+	struct skb_shared_info *shinfo = skb_shinfo(skb);
+	unsigned short gso_size;
+	unsigned int seglen;
+
+	if (shinfo->gso_size == GSO_BY_FRAGS)
+		return -EINVAL;
+
+	seglen = skb_gso_network_seglen(skb);
+
+	/* Decrease gso_size to fit specified network length */
+	gso_size = shinfo->gso_size - (seglen - network_len);
+	if (shinfo->gso_type & SKB_GSO_UDP)
+		gso_size &= ~7;
+
+	if (!(shinfo->gso_type & SKB_GSO_DODGY)) {
+		/* Recalculate gso_segs for skbs of trusted sources.
+		 * Untrusted skbs will have their gso_segs calculated by
+		 * skb_gso_segment.
+		 */
+		unsigned int hdr_len, payload_len;
+
+		hdr_len = seglen - shinfo->gso_size;
+		payload_len = skb->len - hdr_len;
+		shinfo->gso_segs = DIV_ROUND_UP(payload_len, gso_size);
+
+		// Q: need to verify gso_segs <= dev->gso_max_segs?
+		//    seems to be protected by netif_skb_features
+	}
+	shinfo->gso_size = gso_size;
+	return 0;
+}
+
 static int ip_finish_output_gso(struct net *net, struct sock *sk,
 				struct sk_buff *skb, unsigned int mtu)
 {
@@ -237,6 +271,16 @@ static int ip_finish_output_gso(struct net *net, struct sock *sk,
 	 * 2) skb arrived via virtio-net, we thus get TSO/GSO skbs directly
 	 * from host network stack.
 	 */
+
+	/* Attempt to clamp gso_size to avoid segmenting and fragmenting.
+	 *
+	 * Q: policy needed? per device?
+	 */
+	if (sysctl_ip_output_gso_clamp) {
+		if (!skb_gso_clamp(skb, mtu))
+			return ip_finish_output2(net, sk, skb);
+	}
+
 	features = netif_skb_features(skb);
 	BUILD_BUG_ON(sizeof(*IPCB(skb)) > SKB_SGO_CB_OFFSET);
 	segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK);
-- 
1.9.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ