[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1386105850.30495.38.camel@edumazet-glaptop2.roam.corp.google.com>
Date: Tue, 03 Dec 2013 13:24:10 -0800
From: Eric Dumazet <eric.dumazet@...il.com>
To: Or Gerlitz <or.gerlitz@...il.com>
Cc: Joseph Gasparakis <joseph.gasparakis@...el.com>,
Jerry Chu <hkchu@...gle.com>,
Or Gerlitz <ogerlitz@...lanox.com>,
Eric Dumazet <edumazet@...gle.com>,
Alexei Starovoitov <ast@...mgrid.com>,
Pravin B Shelar <pshelar@...ira.com>,
David Miller <davem@...emloft.net>,
netdev <netdev@...r.kernel.org>
Subject: Re: vxlan/veth performance issues on net.git + latest kernels
On Tue, 2013-12-03 at 23:09 +0200, Or Gerlitz wrote:
> On Tue, Dec 3, 2013 at 11:11 PM, Joseph Gasparakis
> <joseph.gasparakis@...el.com> wrote:
>
> >>> lack of GRO : receiver seems to not be able to receive as fast as you want.
> >>>> TCPOFOQueue: 3167879
> >>> So many packets are received out of order (because of losses)
>
> >> I see that there's no GRO also for the non-veth tests which involve
> >> vxlan, and over there the receiving side is capable to consume the
> >> packets, do you have rough explaination why adding veth to the chain
> >> is such game changer which makes things to start falling out?
>
> > I have seen this before. Here are my findings:
> >
> > The gso_type is different if the skb comes from veth or not. From veth,
> > you will see the SKB_GSO_DODGY set. This breaks things as when the
> > skb with DODGY set moves from vxlan to the driver through dev_xmit_hard,
> > the stack drops it silently. I never got the time to find the root cause
> > for this, but I know it causes re-transmissions and big performance
> > degregation.
> >
> > I went as far as just quickly hacking a one liner unsetting the DODGY bit
> > in vxlan.c and that bypassed the issue and recovered the performance
> > problem, but obviously this is not a real fix.
>
> thanks for the heads up, few quick questions/clafications --
>
> -- you are talking on drops done @ the sender side, correct? Eric was
> saying we have evidences that the drops happen on the receiver.
I suggested you take a look at the receiver, like "ifconfig -a"
I suspect one cpu is 100% in sofirq mode draining packets from the NIC
and feeding IP / TCP stack.
Because of vxlan encap, all the packets are delivered to a single RX
queue (I dont think mlx4 is able to look at inner header to get L4 info)
mpstat -P ALL 10 10
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists