[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAF=yD-LaDvQdkE_BkZX7o1ukjyodWiwK=nJ5S=bTgJ-91KBhHg@mail.gmail.com>
Date: Wed, 18 Apr 2018 09:49:18 -0400
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Paolo Abeni <pabeni@...hat.com>
Cc: Network Development <netdev@...r.kernel.org>,
Willem de Bruijn <willemb@...gle.com>
Subject: Re: [PATCH RFC net-next 00/11] udp gso
On Wed, Apr 18, 2018 at 7:17 AM, Paolo Abeni <pabeni@...hat.com> wrote:
> On Tue, 2018-04-17 at 16:00 -0400, Willem de Bruijn wrote:
>> From: Willem de Bruijn <willemb@...gle.com>
>>
>> Segmentation offload reduces cycles/byte for large packets by
>> amortizing the cost of protocol stack traversal.
>>
>> This patchset implements GSO for UDP. A process can concatenate and
>> submit multiple datagrams to the same destination in one send call
>> by setting socket option SOL_UDP/UDP_SEGMENT with the segment size,
>> or passing an analogous cmsg at send time.
>>
>> The stack will send the entire large (up to network layer max size)
>> datagram through the protocol layer. At the GSO layer, it is broken
>> up in individual segments. All receive the same network layer header
>> and UDP src and dst port. All but the last segment have the same UDP
>> header, but the last may differ in length and checksum.
>
> This is interesting, thanks for sharing!
>
> I have some local patches somewhere implementing UDP GRO, but I never
> tried to upstream them, since I lacked the associated GSO and I thought
> that the use-case was not too relevant.
>
> Given that your use-case is a connected socket - no per packet route
> lookup - how does GSO performs compared to plain sendmmsg()? Have you
> considered using and/or improving the latter?
>
> When testing with Spectre/Meltdown mitigation in places, I expect that
> the most relevant part of the gain is due to the single syscall per
> burst.
The main benefit is actually not route lookup avoidance. Somewhat to
my surprise. The benchmark can be run both in connected and
unconnected ('-u') mode. Both saturate the cpu cycles, so only showing
throughput:
[connected] udp tx: 825 MB/s 588336 calls/s 14008 msg/s
[unconnected] udp tx: 711 MB/s 506646 calls/s 12063 msg/s
This corresponds to results previously seen with other applications
of about 15%.
When looking at a perf report, there is no clear hot spot, which
indicates that the savings accrue across the protocol stack traversal.
I just hacked up a sendmmsg extension to the benchmark to verify.
Indeed that does not have nearly the same benefit as GSO:
udp tx: 976 MB/s 695394 calls/s 16557 msg/s
This matches the numbers seen from TCP without TSO and GSO.
That also has few system calls, but observes per MTU stack traversal.
I pushed the branch to my github at
https://github.com/wdebruij/linux/tree/udpgso-20180418
and also the version I sent for RFC yesterday at
https://github.com/wdebruij/linux/tree/udpgso-rfc-v1
Powered by blists - more mailing lists