[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+mtBx-v5dvX-OPGeWJb6uwQGhJ5T6iewD83xJGqXb+Ggf+LQw@mail.gmail.com>
Date: Wed, 21 Jan 2015 11:45:52 -0800
From: Tom Herbert <therbert@...gle.com>
To: Pravin Shelar <pshelar@...ira.com>
Cc: David Miller <davem@...emloft.net>,
Linux Netdev List <netdev@...r.kernel.org>
Subject: Re: [PATCH net-next 0/3] openvswitch: Add STT support.
> I used bare metal intel servers. All VXLAN tests were done using linux
> kernel device without any VMs. All STT tests are done using OVS bridge
> and STT port.
>
So right off the bat you're running the baseline differently than the
target. Anyway, I cannot replicate your numbers for VXLAN, I see much
better performance and this with pretty old servers and dumb NICs. I
suspect you might not have GSO/GRO properly enabled, but instead of
trying to debug your setup, I'd rather restate my request that you
provide a network interface to STT so we can do our own fair
comparison.
>> Another thing to consider in your analysis is the performance with
>> flows using small packets. STT should demonstrate better performance
>> with bulk flows since LSO and LRO are better performing relative to
>> GSO and GRO. But for flows with small packets, I don't see how there
>> could be any performance advantage. We already have ways to leverage
>> simple UDP checksum offload with UDP encapsulations, seems like STT
>> might just represent unnecessary header overhead in those cases.
>>
> All tunneling protocol has performance issue with small packet, I do
> not see how is it related to STT patch. STT also make use of checksum
> offload, so there should not be much overhead.
>
Given the pervasiveness of these patches and the fact that this is
"modifying" the definition of TCP protocol we see on the wire, I would
like to see more effort into analyzing performance and effects of this
encapsulation. I assume that STT is intended to be used for small
packets so it seems entirely reasonable to ask what exactly the
performance effects are for that case. It's really not that hard to
run some performance numbers with comparisons (TCP_RR, TCP_STREAM,
TCP_CRR, with IPv6, etc.) and report them in the patch set
description. You can look at FOU, GUE, vxlan rco, and the checksum
patches I did for some examples.
>>> VXLAN:
>>> CPU
>>> Client: 1.6
>>> Server: 14.2
>>> Throughput: 5.6 Gbit/s
>>>
>>> VXLAN with rcsum:
>>> CPU
>>> Client: 0.89
>>> Server: 12.4
>>> Throughput: 5.8 Gbit/s
>>>
>>> STT:
>>> CPU
>>> Client: 1.28
>>> Server: 4.0
>>> Throughput: 9.5 Gbit/s
>>>
>> 9.5Gbps? Rounding error or is this 40Gbps or larger than 1500 byte MTU?
>>
> Nope, its same as VXLAN setup, 10Gbps NIC with 1500MTU.
>
That would exceed that theoretical maximum for TCP over 10Gbps
Ethernet. How are you measuring throughput? How many bytes of protocol
headers are in STT case?
Thanks,
Tom
> Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists