[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <529DF340.70602@mellanox.com>
Date: Tue, 3 Dec 2013 17:05:36 +0200
From: Or Gerlitz <ogerlitz@...lanox.com>
To: Eric Dumazet <edumazet@...gle.com>,
Alexei Starovoitov <ast@...mgrid.com>,
Pravin B Shelar <pshelar@...ira.com>
CC: David Miller <davem@...emloft.net>, netdev <netdev@...r.kernel.org>
Subject: vxlan/veth performance issues on net.git + latest kernels
I've been chasing lately a performance issues which come into play when
combining veth and vxlan over fast Ethernet NIC.
I came across it while working to enable TCP stateless offloads for
vxlan encapsulated traffic in the mlx4 driver, but I can clearly see the
issue without any HWoffloads involved, so it would be easier to discuss
like that (no offloads involved).
The setup involves a stacked {veth --> bridge --> vlxan --> IP stack -->
NIC} or {veth --> ovs+vxlan --> IP stack --> NIC} chain.
Basically, in my testbed which uses iperf over 40Gbs Mellanox NICs,
vxlan traffic goes up to 5-7Gbs for single session and up to 14Gbs for
multiple sessions, as long as veth isn't involved. Once veth is used I
can't get to > 7-8Gbs, no matter how many sessions are used. For the
time being, I manually took into account the tunneling overhead and
reduced the veth pair MTU by 50 bytes.
Looking on the kernel TCP counters in a {veth --> bridge --> vxlan -->
NIC} configuration, on the client side I see lots of hits for the
following TCP counters (the numbers are just single sample, I look on
the output of iterative sampling every seconds, e.g using "watch -d -n 1
netstat -st"):
67092 segments retransmited
31461 times recovered from packet loss due to SACK data
Detected reordering 1045142 times using SACK
436215 fast retransmits
59966 forward retransmits
Also on the passive side I see hits for the "Quick ack mode was
activated N times" counter, see below full snapshot of the counters from
both sides.
Without using veth, e.g when running in a {vxlan -> NIC} or {bridge -->
vxlan --> NIC},I see hits only for the "recovered from packet loss due
to SACK data" counter and fastretransmits counter, but not for the
forward retransmits or "Detected reordering N timesusing SACK". Also,
the quick ack mode counter isn't active on the passive side.
I tested net.git (3.13-rc2+), 3.12.2 and 3.11.9, I see the same problems
on all. At this point I don't really see a past point to go and apply
bisection. So I hope this counter report can help to shed some light on
the nature of the problem and possible solution, so ideas welcome!!
without vxlan, these are the Gbs results for 1/2/4 streams over 3.12.2,
the results
for the net.git are pretty much the same.
18/32/38 NIC
17/30/35 bridge --> NIC
14/23/35 veth --> bridge --> NIC
with vxlan, these are the Gbs results for 1/2/4 streams
6/12/14 vxlan --> IP --> NIC
5/10/14 bridge --> vxlan --> IP --> NIC
6/7/7 veth --> bridge --> vxlan --> IP --> NIC
Also, the 3.12.2 number do get any better also when adding a ported
version of 82d8189826d5 "veth: extend features to support tunneling" on
top of 3.12.2
See @ the end the sequence of commands I use for the environment
Or.
--> TCP counters from active side
# netstat -ts
IcmpMsg:
InType0: 2
InType8: 1
OutType0: 1
OutType3: 4
OutType8: 2
Tcp:
189 active connections openings
4 passive connection openings
0 failed connection attempts
0 connection resets received
4 connections established
22403193 segments received
541234150 segments send out
14248 segments retransmited
0 bad segments received.
5 resets sent
UdpLite:
TcpExt:
2 invalid SYN cookies received
178 TCP sockets finished time wait in fast timer
10 delayed acks sent
Quick ack mode was activated 1 times
4 packets directly queued to recvmsg prequeue.
3728 packets directly received from backlog
2 packets directly received from prequeue
2524 packets header predicted
4 packets header predicted and directly queued to user
19793310 acknowledgments not containing data received
1216966 predicted acknowledgments
2130 times recovered from packet loss due to SACK data
Detected reordering 73 times using FACK
Detected reordering 11424 times using SACK
55 congestion windows partially recovered using Hoe heuristic
TCPDSACKUndo: 457
2 congestion windows recovered after partial ack
11498 fast retransmits
2748 forward retransmits
2 other TCP timeouts
TCPLossProbes: 4
3 DSACKs sent for old packets
TCPSackShifted: 1037782
TCPSackMerged: 332827
TCPSackShiftFallback: 598055
TCPRcvCoalesce: 380
TCPOFOQueue: 463
TCPSpuriousRtxHostQueues: 192
IpExt:
InNoRoutes: 1
InMcastPkts: 191
OutMcastPkts: 28
InBcastPkts: 25
InOctets: 1789360097
OutOctets: 893757758988
InMcastOctets: 8152
OutMcastOctets: 3044
InBcastOctets: 4259
InNoECTPkts: 30117553
--> TCP counters from passiveside
netstat -ts
IcmpMsg:
InType0: 1
InType8: 2
OutType0: 2
OutType3: 5
OutType8: 1
Tcp:
75 active connections openings
140 passive connection openings
0 failed connection attempts
0 connection resets received
4 connections established
146888643 segments received
27430160 segments send out
0 segments retransmited
0 bad segments received.
6 resets sent
UdpLite:
TcpExt:
3 invalid SYN cookies received
72 TCP sockets finished time wait in fast timer
10 delayed acks sent
3 delayed acks further delayed because of locked socket
Quick ack mode was activated 13548 times
4 packets directly queued to recvmsg prequeue.
2 packets directly received from prequeue
139384763 packets header predicted
2 packets header predicted and directly queued to user
671 acknowledgments not containing data received
938 predicted acknowledgments
TCPLossProbes: 2
TCPLossProbeRecovery: 1
14 DSACKs sent for old packets
TCPBacklogDrop: 848
TCPRcvCoalesce: 118368414
TCPOFOQueue: 3167879
IpExt:
InNoRoutes: 1
InMcastPkts: 184
OutMcastPkts: 26
InBcastPkts: 26
InOctets: 1007364296775
OutOctets: 2433872888
InMcastOctets: 6202
OutMcastOctets: 2888
InBcastOctets: 4597
InNoECTPkts: 702313233
client side (node 144)
----------------------
ip link add vxlan42 type vxlan id 42 group 239.0.0.42 ttl 10 dev ethN
ifconfig vxlan42 192.168.42.144/24 up
brctl addbr br-vx
ip link set br-vx up
ifconfig br-vx 192.168.52.144/24 up
brctl addif br-vx vxlan42
ip link add type veth
brctl addif br-vx veth1
ifconfig veth0 192.168.62.144/24 up
ip link set veth1 up
ifconfig veth0 mtu 1450
ifconfig veth1 mtu 1450
server side (node 147)
----------------------
ip link add vxlan42 type vxlan id 42 group 239.0.0.42 ttl 10 dev ethN
ifconfig vxlan42 192.168.42.147/24 up
brctl addbr br-vx
ip link set br-vx up
ifconfig br-vx 192.168.52.147/24 up
brctl addif br-vx vxlan42
ip link add type veth
brctl addif br-vx veth1
ifconfig veth0 192.168.62.147/24 up
ip link set veth1 up
ifconfig veth0 mtu 1450
ifconfig veth1 mtu 1450
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists