netdev - Re: Initial thoughts on TXDP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aac93b13-6298-b9eb-7f3c-b074f22c388c@hpe.com>
Date:   Thu, 1 Dec 2016 11:48:14 -0800
From:   Rick Jones <rick.jones2@....com>
To:     Tom Herbert <tom@...bertland.com>,
        Sowmini Varadhan <sowmini.varadhan@...cle.com>
Cc:     Linux Kernel Network Developers <netdev@...r.kernel.org>
Subject: Re: Initial thoughts on TXDP

On 12/01/2016 11:05 AM, Tom Herbert wrote:
> For the GSO and GRO the rationale is that performing the extra SW
> processing to do the offloads is significantly less expensive than
> running each packet through the full stack. This is true in a
> multi-layered generalized stack. In TXDP, however, we should be able
> to optimize the stack data path such that that would no longer be
> true. For instance, if we can process the packets received on a
> connection quickly enough so that it's about the same or just a little
> more costly than GRO processing then we might bypass GRO entirely.
> TSO is probably still relevant in TXDP since it reduces overheads
> processing TX in the device itself.

Just how much per-packet path-length are you thinking will go away under 
the likes of TXDP?  It is admittedly "just" netperf but losing TSO/GSO 
does some non-trivial things to effective overhead (service demand) and 
so throughput:

stack@...cp1-c0-m1-mgmt:~/rjones2$ ./netperf -c -H np-cp1-c1-m3-mgmt -- 
-P 12867
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 12867 AF_INET to 
np-cp1-c1-m3-mgmt () port 12867 AF_INET : demo
Recv   Send    Send                          Utilization       Service 
Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local 
remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % U      us/KB   us/KB

  87380  16384  16384    10.00      9260.24   2.02     -1.00    0.428 
-1.000
stack@...cp1-c0-m1-mgmt:~/rjones2$ sudo ethtool -K hed0 tso off gso off
stack@...cp1-c0-m1-mgmt:~/rjones2$ ./netperf -c -H np-cp1-c1-m3-mgmt -- 
-P 12867
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 12867 AF_INET to 
np-cp1-c1-m3-mgmt () port 12867 AF_INET : demo
Recv   Send    Send                          Utilization       Service 
Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local 
remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % U      us/KB   us/KB

  87380  16384  16384    10.00      5621.82   4.25     -1.00    1.486 
-1.000

And that is still with the stretch-ACKs induced by GRO at the receiver.

Losing GRO has quite similar results:
stack@...cp1-c0-m1-mgmt:~/rjones2$ ./netperf -c -H np-cp1-c1-m3-mgmt -t 
TCP_MAERTS -- -P 12867
MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 12867 AF_INET to 
np-cp1-c1-m3-mgmt () port 12867 AF_INET : demo
Recv   Send    Send                          Utilization       Service 
Demand
Socket Socket  Message  Elapsed              Recv     Send     Recv    Send
Size   Size    Size     Time     Throughput  local    remote   local 
remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % U      us/KB   us/KB

  87380  16384  16384    10.00      9154.02   4.00     -1.00    0.860 
-1.000
stack@...cp1-c0-m1-mgmt:~/rjones2$ sudo ethtool -K hed0 gro off

stack@...cp1-c0-m1-mgmt:~/rjones2$ ./netperf -c -H np-cp1-c1-m3-mgmt -t 
TCP_MAERTS -- -P 12867
MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 12867 AF_INET to 
np-cp1-c1-m3-mgmt () port 12867 AF_INET : demo
Recv   Send    Send                          Utilization       Service 
Demand
Socket Socket  Message  Elapsed              Recv     Send     Recv    Send
Size   Size    Size     Time     Throughput  local    remote   local 
remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % U      us/KB   us/KB

  87380  16384  16384    10.00      4212.06   5.36     -1.00    2.502 
-1.000

I'm sure there is a very non-trivial "it depends" component here - 
netperf will get the peak benefit from *SO and so one will see the peak 
difference in service demands - but even if one gets only 6 segments per 
*SO that is a lot of path-length to make-up.

4.4 kernel, BE3 NICs ... E5-2640 0 @ 2.50GHz

And even if one does have the CPU cycles to burn so to speak, the effect 
on power consumption needs to be included in the calculus.

happy benchmarking,

rick jones