[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <8e9ea3c4-82e0-a34c-08ea-32a387e4c9e1@solarflare.com>
Date: Wed, 14 Nov 2018 18:07:49 +0000
From: Edward Cree <ecree@...arflare.com>
To: <linux-net-drivers@...arflare.com>, <davem@...emloft.net>
CC: <netdev@...r.kernel.org>, <eric.dumazet@...il.com>
Subject: [PATCH v3 net-next 0/4] net: batched receive in GRO path
This series listifies part of GRO processing, in a manner which allows those
packets which are not GROed (i.e. for which dev_gro_receive returns
GRO_NORMAL) to be passed on to the listified regular receive path.
dev_gro_receive() itself is not listified, nor the per-protocol GRO
callback, since GRO's need to hold packets on lists under napi->gro_hash
makes keeping the packets on other lists awkward, and since the GRO control
block state of held skbs can refer only to one 'new' skb at a time.
Performance figures with this series, collected on a back-to-back pair of
Solarflare sfn8522-r2 NICs with 120-second NetPerf tests. In the stats,
sample size n for old and new code is 6 runs each; p is from a Welch t-test.
Tests were run both with GRO enabled and disabled, the latter simulating
uncoalesceable packets (e.g. due to IP or TCP options). Payload_size in all
tests was 8000 bytes. BW tests use 4 streams, RR tests use 100.
TCP Stream, GRO on:
net-next: 9.415 Gb/s (line rate); 190% total rxcpu
after #4: 9.415 Gb/s; 192% total rxcpu
p_bw = 0.155; p_cpu = 0.382
TCP Stream, GRO off:
net-next: 5.625 Gb/s
after #4: 6.551 Gb/s
16.5% faster; p < 0.001
TCP RR, GRO on:
net-next: 837.6 us
after #4: 840.0 us
0.3% slower; p = 0.229
TCP RR, GRO off:
net-next: 867.6 us
after #4: 860.1 us
0.9% faster; p = 0.064
UDP Stream (GRO off):
net-next: 7.808 Gb/s
after #4: 7.848 Gb/s
0.5% slower; p = 0.144
Conclusion:
* TCP b/w is 16.5% faster for traffic which cannot be coalesced by GRO.
* TCP latency might be slightly improved in the same case, but it's not
quite statistically significant
* Both see no statistically significant change in performance with GRO
active
* UDP throughput might be slightly slowed (probably by patch #3) but it's
not statistically significant. Note that drivers which (unlike sfc) pass
UDP traffic to GRO will probably see gains here as this gives them access
to bundling.
Change history:
v3: Rebased on latest net-next. Re-ran performance tests and added TCP_RR
tests at suggestion of Eric Dumazet. Expanded changelog of patch #3.
v2: Rebased on latest net-next. Removed RFC tags. Otherwise unchanged
owing to lack of comments on v1.
Edward Cree (4):
net: introduce list entry point for GRO
sfc: use batched receive for GRO
net: make listified RX functions return number of good packets
net/core: handle GRO_NORMAL skbs as a list in napi_gro_receive_list
drivers/net/ethernet/sfc/efx.c | 11 +++-
drivers/net/ethernet/sfc/net_driver.h | 1 +
drivers/net/ethernet/sfc/rx.c | 16 +++++-
include/linux/netdevice.h | 6 +-
include/net/ip.h | 4 +-
include/net/ipv6.h | 4 +-
net/core/dev.c | 104 ++++++++++++++++++++++++++--------
net/ipv4/ip_input.c | 39 ++++++++-----
net/ipv6/ip6_input.c | 37 +++++++-----
9 files changed, 157 insertions(+), 65 deletions(-)
Powered by blists - more mailing lists