netdev - [PATCH v3 net-next 0/4] net: batched receive in GRO path

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <8e9ea3c4-82e0-a34c-08ea-32a387e4c9e1@solarflare.com>
Date:   Wed, 14 Nov 2018 18:07:49 +0000
From:   Edward Cree <ecree@...arflare.com>
To:     <linux-net-drivers@...arflare.com>, <davem@...emloft.net>
CC:     <netdev@...r.kernel.org>, <eric.dumazet@...il.com>
Subject: [PATCH v3 net-next 0/4] net: batched receive in GRO path

This series listifies part of GRO processing, in a manner which allows those
 packets which are not GROed (i.e. for which dev_gro_receive returns
 GRO_NORMAL) to be passed on to the listified regular receive path.
dev_gro_receive() itself is not listified, nor the per-protocol GRO
 callback, since GRO's need to hold packets on lists under napi->gro_hash
 makes keeping the packets on other lists awkward, and since the GRO control
 block state of held skbs can refer only to one 'new' skb at a time.

Performance figures with this series, collected on a back-to-back pair of
 Solarflare sfn8522-r2 NICs with 120-second NetPerf tests.  In the stats,
 sample size n for old and new code is 6 runs each; p is from a Welch t-test.
Tests were run both with GRO enabled and disabled, the latter simulating
 uncoalesceable packets (e.g. due to IP or TCP options).  Payload_size in all
 tests was 8000 bytes.  BW tests use 4 streams, RR tests use 100.
TCP Stream, GRO on:
net-next: 9.415 Gb/s (line rate); 190% total rxcpu
after #4: 9.415 Gb/s; 192% total rxcpu
 p_bw = 0.155; p_cpu = 0.382
TCP Stream, GRO off:
net-next: 5.625 Gb/s
after #4: 6.551 Gb/s
  16.5% faster; p < 0.001
TCP RR, GRO on:
net-next: 837.6 us
after #4: 840.0 us
  0.3% slower; p = 0.229
TCP RR, GRO off:
net-next: 867.6 us
after #4: 860.1 us
  0.9% faster; p = 0.064
UDP Stream (GRO off):
net-next: 7.808 Gb/s
after #4: 7.848 Gb/s
  0.5% slower; p = 0.144
Conclusion:
* TCP b/w is 16.5% faster for traffic which cannot be coalesced by GRO.
* TCP latency might be slightly improved in the same case, but it's not
  quite statistically significant
* Both see no statistically significant change in performance with GRO
  active
* UDP throughput might be slightly slowed (probably by patch #3) but it's
  not statistically significant.  Note that drivers which (unlike sfc) pass
  UDP traffic to GRO will probably see gains here as this gives them access
  to bundling.

Change history:
v3: Rebased on latest net-next.  Re-ran performance tests and added TCP_RR
 tests at suggestion of Eric Dumazet.  Expanded changelog of patch #3.

v2: Rebased on latest net-next.  Removed RFC tags.  Otherwise unchanged
 owing to lack of comments on v1.

Edward Cree (4):
  net: introduce list entry point for GRO
  sfc: use batched receive for GRO
  net: make listified RX functions return number of good packets
  net/core: handle GRO_NORMAL skbs as a list in napi_gro_receive_list

 drivers/net/ethernet/sfc/efx.c        |  11 +++-
 drivers/net/ethernet/sfc/net_driver.h |   1 +
 drivers/net/ethernet/sfc/rx.c         |  16 +++++-
 include/linux/netdevice.h             |   6 +-
 include/net/ip.h                      |   4 +-
 include/net/ipv6.h                    |   4 +-
 net/core/dev.c                        | 104 ++++++++++++++++++++++++++--------
 net/ipv4/ip_input.c                   |  39 ++++++++-----
 net/ipv6/ip6_input.c                  |  37 +++++++-----
 9 files changed, 157 insertions(+), 65 deletions(-)