lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 03 Dec 2013 07:30:20 -0800
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Or Gerlitz <ogerlitz@...lanox.com>
Cc:	Eric Dumazet <edumazet@...gle.com>,
	Alexei Starovoitov <ast@...mgrid.com>,
	Pravin B Shelar <pshelar@...ira.com>,
	David Miller <davem@...emloft.net>,
	netdev <netdev@...r.kernel.org>
Subject: Re: vxlan/veth performance issues on net.git + latest kernels

On Tue, 2013-12-03 at 17:05 +0200, Or Gerlitz wrote:
> I've been chasing lately a performance issues which come into play when 
> combining veth and vxlan over fast Ethernet NIC.
> 
> I came across it while working to enable TCP stateless offloads for 
> vxlan encapsulated traffic in the mlx4 driver, but I can clearly see the 
> issue without any HWoffloads involved, so it would be easier to discuss 
> like that (no offloads involved).
> 
> The setup involves a stacked {veth --> bridge --> vlxan --> IP stack --> 
> NIC} or {veth --> ovs+vxlan -->  IP stack --> NIC} chain.
> 
> Basically, in my testbed which uses iperf over 40Gbs Mellanox NICs, 
> vxlan traffic goes up to 5-7Gbs for single session and up to 14Gbs for 
> multiple sessions, as long as veth isn't involved. Once veth is used I 
> can't get to > 7-8Gbs, no matter how many sessions are used. For the 
> time being, I manually took into account the tunneling overhead and 
> reduced the veth pair MTU by 50 bytes.
> 
> Looking on the kernel TCP counters in a {veth --> bridge --> vxlan --> 
> NIC} configuration, on the client side I see lots of hits for the 
> following TCP counters (the numbers are just single sample, I look on 
> the output of iterative sampling every seconds, e.g using "watch -d -n 1 
> netstat -st"):
> 
> 67092 segments retransmited
> 
> 31461 times recovered from packet loss due to SACK data
> Detected reordering 1045142 times using SACK
> 436215 fast retransmits
> 59966 forward retransmits
> 
> Also on the passive side I see hits for the "Quick ack mode was 
> activated N times" counter, see below full snapshot of the counters from 
> both sides.
> 
> Without using veth, e.g when running in a {vxlan -> NIC} or {bridge --> 
> vxlan --> NIC},I see hits only for the "recovered from packet loss due 
> to SACK data" counter and fastretransmits counter,  but not for the 
> forward retransmits or "Detected reordering N timesusing SACK". Also, 
> the quick ack mode counter isn't active on the passive side.
> 
> I tested net.git (3.13-rc2+), 3.12.2 and 3.11.9, I see the same problems 
> on all. At this point I don't really see a past point to go and apply 
> bisection. So I hope this counter report can help to shed some light on 
> the nature of the problem and possible solution, so ideas welcome!!
> 
> without vxlan, these are the Gbs results for 1/2/4 streams over 3.12.2, 
> the results
> for the net.git are pretty much the same.
> 
> 18/32/38  NIC
> 17/30/35  bridge --> NIC
> 14/23/35  veth --> bridge --> NIC
> 
> with vxlan, these are the Gbs results for 1/2/4 streams
> 
> 6/12/14  vxlan --> IP --> NIC
> 5/10/14  bridge --> vxlan --> IP --> NIC
> 6/7/7    veth --> bridge --> vxlan --> IP --> NIC
> 
> Also, the 3.12.2 number do get any better also when adding a ported 
> version of 82d8189826d5 "veth: extend features to support tunneling" on 
> top of 3.12.2
> 
> See @ the end the sequence of commands I use for the environment
> 
> Or.
> 
> 
> --> TCP counters from active side
> 
> # netstat -ts
> IcmpMsg:
>      InType0: 2
>      InType8: 1
>      OutType0: 1
>      OutType3: 4
>      OutType8: 2
> Tcp:
>      189 active connections openings
>      4 passive connection openings
>      0 failed connection attempts
>      0 connection resets received
>      4 connections established
>      22403193 segments received
>      541234150 segments send out
>      14248 segments retransmited
>      0 bad segments received.
>      5 resets sent
> UdpLite:
> TcpExt:
>      2 invalid SYN cookies received
>      178 TCP sockets finished time wait in fast timer
>      10 delayed acks sent
>      Quick ack mode was activated 1 times
>      4 packets directly queued to recvmsg prequeue.
>      3728 packets directly received from backlog
>      2 packets directly received from prequeue
>      2524 packets header predicted
>      4 packets header predicted and directly queued to user
>      19793310 acknowledgments not containing data received
>      1216966 predicted acknowledgments
>      2130 times recovered from packet loss due to SACK data
>      Detected reordering 73 times using FACK
>      Detected reordering 11424 times using SACK
>      55 congestion windows partially recovered using Hoe heuristic
>      TCPDSACKUndo: 457
>      2 congestion windows recovered after partial ack
>      11498 fast retransmits
>      2748 forward retransmits
>      2 other TCP timeouts
>      TCPLossProbes: 4
>      3 DSACKs sent for old packets
>      TCPSackShifted: 1037782
>      TCPSackMerged: 332827
>      TCPSackShiftFallback: 598055
>      TCPRcvCoalesce: 380
>      TCPOFOQueue: 463
>      TCPSpuriousRtxHostQueues: 192
> IpExt:
>      InNoRoutes: 1
>      InMcastPkts: 191
>      OutMcastPkts: 28
>      InBcastPkts: 25
>      InOctets: 1789360097
>      OutOctets: 893757758988
>      InMcastOctets: 8152
>      OutMcastOctets: 3044
>      InBcastOctets: 4259
>      InNoECTPkts: 30117553
> 
> 
> 
> --> TCP counters from passiveside
> 
> netstat -ts
> IcmpMsg:
>      InType0: 1
>      InType8: 2
>      OutType0: 2
>      OutType3: 5
>      OutType8: 1
> Tcp:
>      75 active connections openings
>      140 passive connection openings
>      0 failed connection attempts
>      0 connection resets received
>      4 connections established
>      146888643 segments received
>      27430160 segments send out
>      0 segments retransmited
>      0 bad segments received.
>      6 resets sent
> UdpLite:
> TcpExt:
>      3 invalid SYN cookies received
>      72 TCP sockets finished time wait in fast timer
>      10 delayed acks sent
>      3 delayed acks further delayed because of locked socket
>      Quick ack mode was activated 13548 times
>      4 packets directly queued to recvmsg prequeue.
>      2 packets directly received from prequeue
>      139384763 packets header predicted
>      2 packets header predicted and directly queued to user
>      671 acknowledgments not containing data received
>      938 predicted acknowledgments
>      TCPLossProbes: 2
>      TCPLossProbeRecovery: 1
>      14 DSACKs sent for old packets
>      TCPBacklogDrop: 848

Thats bad : Dropping packets on receiver.

Check also "ifconfig -a" to see if rxdrop is increasing as well.

>      TCPRcvCoalesce: 118368414

lack of GRO : receiver seems to not be able to receive as fast as you
want.

>      TCPOFOQueue: 3167879

So many packets are received out of order (because of losses)

> IpExt:
>      InNoRoutes: 1
>      InMcastPkts: 184
>      OutMcastPkts: 26
>      InBcastPkts: 26
>      InOctets: 1007364296775
>      OutOctets: 2433872888
>      InMcastOctets: 6202
>      OutMcastOctets: 2888
>      InBcastOctets: 4597
>      InNoECTPkts: 702313233
> 
> 
> client side (node 144)
> ----------------------
> 
> ip link add vxlan42 type vxlan id 42 group 239.0.0.42 ttl 10 dev ethN
> ifconfig vxlan42 192.168.42.144/24 up
> 
> brctl addbr br-vx
> ip link set br-vx up
> 
> ifconfig br-vx 192.168.52.144/24 up
> brctl addif br-vx vxlan42
> 
> ip link add type veth
> brctl addif br-vx veth1
> ifconfig veth0 192.168.62.144/24 up
> ip link set veth1 up
> 
> ifconfig veth0 mtu 1450
> ifconfig veth1 mtu 1450
> 
> 
> server side (node 147)
> ----------------------
> 
> ip link add vxlan42 type vxlan id 42 group 239.0.0.42 ttl 10 dev ethN
> ifconfig vxlan42 192.168.42.147/24 up
> 
> brctl addbr br-vx
> ip link set br-vx up
> 
> ifconfig br-vx 192.168.52.147/24 up
> brctl addif br-vx vxlan42
> 
> 
> ip link add type veth
> brctl addif br-vx veth1
> ifconfig veth0 192.168.62.147/24 up
> ip link set veth1 up
> 
> ifconfig veth0 mtu 1450
> ifconfig veth1 mtu 1450
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ