netdev - Re: [PATCH net-next 0/3] openvswitch: Add STT support.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Wed, 21 Jan 2015 12:35:43 -0800
From:	Jesse Gross <jesse@...ira.com>
To:	Tom Herbert <therbert@...gle.com>
Cc:	Pravin Shelar <pshelar@...ira.com>,
	David Miller <davem@...emloft.net>,
	Linux Netdev List <netdev@...r.kernel.org>
Subject: Re: [PATCH net-next 0/3] openvswitch: Add STT support.

On Wed, Jan 21, 2015 at 11:45 AM, Tom Herbert <therbert@...gle.com> wrote:
>> I used bare metal intel servers. All VXLAN tests were done using linux
>> kernel device without any VMs. All STT tests are done using OVS bridge
>> and STT port.
>>
> So right off the bat you're running the baseline differently than the
> target. Anyway, I cannot replicate your numbers for VXLAN, I see much
> better performance and this with pretty old servers and dumb NICs. I
> suspect you might not have GSO/GRO properly enabled, but instead of
> trying to debug your setup, I'd rather restate my request that you
> provide a network interface to STT so we can do our own fair
> comparison.

If I had to guess, I suspect the difference is that UDP RSS wasn't
enabled, since it doesn't come that way out of the box. Regardless,
you can clearly see a significant difference in single core
performance and CPU consumption.

STT has been fairly well known in network virtualization circles for
the past few years and has some large deployments, so the reported
performance is not a fluke. I remember Pankaj from Microsoft also
mentioning to you that they weren't able to get performance to
reasonable level without TSO. Totally different environment obviously
but same reasoning.

>>>> VXLAN:
>>>> CPU
>>>>   Client: 1.6
>>>>   Server: 14.2
>>>> Throughput: 5.6 Gbit/s
>>>>
>>>> VXLAN with rcsum:
>>>> CPU
>>>>   Client: 0.89
>>>>   Server: 12.4
>>>> Throughput: 5.8 Gbit/s
>>>>
>>>> STT:
>>>> CPU
>>>>   Client: 1.28
>>>>   Server: 4.0
>>>> Throughput: 9.5 Gbit/s
>>>>
>>> 9.5Gbps? Rounding error or is this 40Gbps or larger than 1500 byte MTU?
>>>
>> Nope, its same as VXLAN setup, 10Gbps NIC with 1500MTU.
>>
> That would exceed that theoretical maximum for TCP over 10Gbps
> Ethernet. How are you measuring throughput? How many bytes of protocol
> headers are in STT case?

For large packet cases, STT actually has less header overhead compared
to the unencapsulated traffic stream. This is because for a group of
STT packets generated by a TSO burst from the guest there is only a
single copy of the inner header. Even though TCP headers are used for
encapsulation, there are no options - as opposed to the inner headers,
which typically contain timestamps. Over the course of the ~45 packets
that could be generated from a maximum sized transmission, this
results in negative encapsulation overhead.

I would recommend you take a look at the draft if you haven't already:
http://tools.ietf.org/html/draft-davie-stt-06

It is currently in the final stages of the RFC publication process.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html