netdev - Re: [PATCH RFC net-next 00/11] udp gso

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAF=yD-LaDvQdkE_BkZX7o1ukjyodWiwK=nJ5S=bTgJ-91KBhHg@mail.gmail.com>
Date:   Wed, 18 Apr 2018 09:49:18 -0400
From:   Willem de Bruijn <willemdebruijn.kernel@...il.com>
To:     Paolo Abeni <pabeni@...hat.com>
Cc:     Network Development <netdev@...r.kernel.org>,
        Willem de Bruijn <willemb@...gle.com>
Subject: Re: [PATCH RFC net-next 00/11] udp gso

On Wed, Apr 18, 2018 at 7:17 AM, Paolo Abeni <pabeni@...hat.com> wrote:
> On Tue, 2018-04-17 at 16:00 -0400, Willem de Bruijn wrote:
>> From: Willem de Bruijn <willemb@...gle.com>
>>
>> Segmentation offload reduces cycles/byte for large packets by
>> amortizing the cost of protocol stack traversal.
>>
>> This patchset implements GSO for UDP. A process can concatenate and
>> submit multiple datagrams to the same destination in one send call
>> by setting socket option SOL_UDP/UDP_SEGMENT with the segment size,
>> or passing an analogous cmsg at send time.
>>
>> The stack will send the entire large (up to network layer max size)
>> datagram through the protocol layer. At the GSO layer, it is broken
>> up in individual segments. All receive the same network layer header
>> and UDP src and dst port. All but the last segment have the same UDP
>> header, but the last may differ in length and checksum.
>
> This is interesting, thanks for sharing!
>
> I have some local patches somewhere implementing UDP GRO, but I never
> tried to upstream them, since I lacked the associated GSO and I thought
> that the use-case was not too relevant.
>
> Given that your use-case is a connected socket - no per packet route
> lookup - how does GSO performs compared to plain sendmmsg()? Have you
> considered using and/or improving the latter?
>
> When testing with Spectre/Meltdown mitigation in places, I expect that
> the most relevant part of the gain is due to the single syscall per
> burst.

The main benefit is actually not route lookup avoidance. Somewhat to
my surprise. The benchmark can be run both in connected and
unconnected ('-u') mode. Both saturate the cpu cycles, so only showing
throughput:

[connected]     udp tx:    825 MB/s   588336 calls/s  14008 msg/s
[unconnected] udp tx:    711 MB/s   506646 calls/s  12063 msg/s

This corresponds to results previously seen with other applications
of about 15%.

When looking at a perf report, there is no clear hot spot, which
indicates that the savings accrue across the protocol stack traversal.

I just hacked up a sendmmsg extension to the benchmark to verify.
Indeed that does not have nearly the same benefit as GSO:

udp tx:    976 MB/s   695394 calls/s  16557 msg/s

This matches the numbers seen from TCP without TSO and GSO.
That also has few system calls, but observes per MTU stack traversal.

I pushed the branch to my github at

  https://github.com/wdebruij/linux/tree/udpgso-20180418

and also the version I sent for RFC yesterday at

  https://github.com/wdebruij/linux/tree/udpgso-rfc-v1