lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 3 Dec 2013 23:36:50 +0200
From:	Or Gerlitz <or.gerlitz@...il.com>
To:	Eric Dumazet <eric.dumazet@...il.com>
Cc:	Joseph Gasparakis <joseph.gasparakis@...el.com>,
	Jerry Chu <hkchu@...gle.com>,
	Or Gerlitz <ogerlitz@...lanox.com>,
	Eric Dumazet <edumazet@...gle.com>,
	Alexei Starovoitov <ast@...mgrid.com>,
	Pravin B Shelar <pshelar@...ira.com>,
	David Miller <davem@...emloft.net>,
	netdev <netdev@...r.kernel.org>
Subject: Re: vxlan/veth performance issues on net.git + latest kernels

On Tue, Dec 3, 2013 at 11:24 PM, Eric Dumazet <eric.dumazet@...il.com> wrote:
> On Tue, 2013-12-03 at 23:09 +0200, Or Gerlitz wrote:
>> On Tue, Dec 3, 2013 at 11:11 PM, Joseph Gasparakis
>> <joseph.gasparakis@...el.com> wrote:
>>
>> >>> lack of GRO : receiver seems to not be able to receive as fast as you want.
>> >>>>      TCPOFOQueue: 3167879
>> >>> So many packets are received out of order (because of losses)
>>
>> >> I see that there's no GRO also for the non-veth tests which involve
>> >> vxlan, and over there the receiving side is capable to consume the
>> >> packets, do you have rough explaination why adding veth to the chain
>> >> is such game changer which makes things to start falling out?
>>
>> > I have seen this before. Here are my findings:
>> >
>> > The gso_type is different if the skb comes from veth or not. From veth,
>> > you will see the SKB_GSO_DODGY set. This breaks things as when the
>> > skb with DODGY set moves from vxlan to the driver through dev_xmit_hard,
>> > the stack drops it silently. I never got the time to find the root cause
>> > for this, but I know it causes re-transmissions and big performance
>> > degregation.
>> >
>> > I went as far as just quickly hacking a one liner unsetting the DODGY bit
>> > in vxlan.c and that bypassed the issue and recovered the performance
>> > problem, but obviously this is not a real fix.
>>
>> thanks for the heads up, few quick questions/clafications --
>>
>> -- you are talking on drops done @ the sender side, correct? Eric was
>> saying we have evidences that the drops happen on the receiver.
>
> I suggested you take a look at the receiver, like "ifconfig -a"

Eric, sorry I am away from the system now, will try to get some access
and report back now and if not, tomorow, but


> I suspect one cpu is 100% in sofirq mode draining packets from the NIC
> and feeding IP / TCP stack.

> Because of vxlan encap, all the packets are delivered to a single RX
> queue (I dont think mlx4 is able to look at inner header to get L4 info)

With the new card, ConnectX3-pro we are able to look on inner headers
and do RX/TX checksum and LSO for the encapsulated traffic, this is
how I initially got into this problem. But as I wrote earlier, I was
able to see the problem w.o activating the offloads for the inner
packets. Sorry if I didn't mention that, but from the mlx4_en NIC
driver point of view, different stream do map to different RX queues,
b/c the HW does RSS on the outer (UDP) header and the sender vxlan
code uses few sockets to send multiple streams which each having
difference source UDP port. For the "outer RSS" you don't need the
-pro card, just make sure the udp_rss module param of mlx4 is set.

I also thought that under veth there's contention point which could
explain why packets are dropped, but haven't found it.


> mpstat -P ALL 10 10
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ