lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <1414405781.4492.38.camel@ubuntu-vm-makita>
Date:	Mon, 27 Oct 2014 19:29:41 +0900
From:	Toshiaki Makita <makita.toshiaki@....ntt.co.jp>
To:	netdev@...r.kernel.org
Cc:	Herbert Xu <herbert@...dor.apana.org.au>,
	Eric Dumazet <edumazet@...gle.com>
Subject: Poor UDP throughput with virtual devices and UFO

Hi,

I recently noticed sending UDP packets ends up with very poor throughput when
using UFO and virtual devices.

Example configurations are:
- macvlan on vlan
- gre on bridge

With these configurations, the upper virtual devices (macvlan, gre) has the
UFO feature and the lower devices (vlan, bridge) don't have it. UFO packets
will be sent from the upper devices and fragmented on the lower devices.
So, they will be fragmented before entering qdisc.

Since skb_segment() doesn't increase sk_wmem_alloc, the send buffer of a UDP
socket looks almost always empty, and user space can send packets with no limit,
which causes massive drops on qdisc.

I wrote a patch to increase sk_wmem_alloc in skb_segment(), but I'm wondering
if we can do this change since it has been this way for years and only TCP
handles it so far (d6a4a1041176 "tcp: GSO should be TSQ friendly").

Here are performance test results (macvlan on vlan):

- Before
# netperf -t UDP_STREAM ...
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

212992   65507   60.00      144096 1224195    1258.56
212992           60.00          51              0.45

Average:        CPU     %user     %nice   %system   %iowait    %steal     %idle
Average:        all      0.23      0.00     25.26      0.08      0.00     74.43
Average:          0      0.29      0.00      0.76      0.29      0.00     98.66
Average:          1      0.21      0.00      0.33      0.00      0.00     99.45
Average:          2      0.05      0.00      0.12      0.07      0.00     99.76
Average:          3      0.36      0.00     99.64      0.00      0.00      0.00

- After
# netperf -t UDP_STREAM ...
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

212992   65507   60.00      109593      0     957.20
212992           60.00      109593            957.20

Average:        CPU     %user     %nice   %system   %iowait    %steal     %idle
Average:        all      0.18      0.00      8.38      0.02      0.00     91.43
Average:          0      0.17      0.00      3.60      0.00      0.00     96.23
Average:          1      0.13      0.00      6.60      0.00      0.00     93.27
Average:          2      0.23      0.00      5.76      0.07      0.00     93.94
Average:          3      0.17      0.00     17.57      0.00      0.00     82.26


The patch (based on net tree) for the test above:

----
Subject: [PATCH net] gso: Inherit sk_wmem_alloc

Signed-off-by: Toshiaki Makita <makita.toshiaki@....ntt.co.jp>
---
 net/core/skbuff.c      |  6 +++++-
 net/ipv4/tcp_offload.c | 13 ++++---------
 2 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index c16615b..29dc763 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3020,7 +3020,7 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
 							    len, 0);
 			SKB_GSO_CB(nskb)->csum_start =
 			    skb_headroom(nskb) + doffset;
-			continue;
+			goto set_owner;
 		}
 
 		nskb_frag = skb_shinfo(nskb)->frags;
@@ -3092,6 +3092,10 @@ perform_csum_check:
 			SKB_GSO_CB(nskb)->csum_start =
 			    skb_headroom(nskb) + doffset;
 		}
+
+set_owner:
+		if (head_skb->sk)
+			skb_set_owner_w(nskb, head_skb->sk);
 	} while ((offset += len) < head_skb->len);
 
 	/* Some callers want to get the end of the list.
diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index 5b90f2f..93758a8 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -139,11 +139,8 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
 			th->check = gso_make_checksum(skb, ~th->check);
 
 		seq += mss;
-		if (copy_destructor) {
+		if (copy_destructor)
 			skb->destructor = gso_skb->destructor;
-			skb->sk = gso_skb->sk;
-			sum_truesize += skb->truesize;
-		}
 		skb = skb->next;
 		th = tcp_hdr(skb);
 
@@ -157,11 +154,9 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
 	 * is freed by GSO engine
 	 */
 	if (copy_destructor) {
-		swap(gso_skb->sk, skb->sk);
-		swap(gso_skb->destructor, skb->destructor);
-		sum_truesize += skb->truesize;
-		atomic_add(sum_truesize - gso_skb->truesize,
-			   &skb->sk->sk_wmem_alloc);
+		skb->destructor = gso_skb->destructor;
+		gso_skb->destructor = NULL;
+		atomic_sub(gso_skb->truesize, &skb->sk->sk_wmem_alloc);
 	}
 
 	delta = htonl(oldlen + (skb_tail_pointer(skb) -
-- 
1.8.1.2



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ