[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <152994950582.9733.3330634251364177102.stgit@anamhost.jf.intel.com>
Date: Mon, 25 Jun 2018 11:04:13 -0700
From: Amritha Nambiar <amritha.nambiar@...el.com>
To: netdev@...r.kernel.org, davem@...emloft.net
Cc: alexander.h.duyck@...el.com, willemdebruijn.kernel@...il.com,
amritha.nambiar@...el.com, sridhar.samudrala@...el.com,
alexander.duyck@...il.com, edumazet@...gle.com,
hannes@...essinduktion.org, tom@...bertland.com
Subject: [net-next PATCH v4 0/7] Symmetric queue selection using XPS for Rx
queues
This patch series implements support for Tx queue selection based on
Rx queue(s) map. This is done by configuring Rx queue(s) map per Tx-queue
using sysfs attribute. If the user configuration for Rx queues does
not apply, then the Tx queue selection falls back to XPS using CPUs and
finally to hashing.
XPS is refactored to support Tx queue selection based on either the
CPUs map or the Rx-queues map. The config option CONFIG_XPS needs to be
enabled. By default no receive queues are configured for the Tx queue.
- /sys/class/net/<dev>/queues/tx-*/xps_rxqs
A set of receive queues can be mapped to a set of transmit queues (many:many),
although the common use case is a 1:1 mapping. This will enable sending
packets on the same Tx-Rx queue association as this is useful for busy polling
multi-threaded workloads where it is not possible to pin the threads to
a CPU. This is a rework of Sridhar's patch for symmetric queueing via
socket option:
https://www.spinics.net/lists/netdev/msg453106.html
Testing Hints:
Kernel: Linux 4.17.0-rc7+
Interface:
driver: ixgbe
version: 5.1.0-k
firmware-version: 0x00015e0b
Configuration:
ethtool -L $iface combined 16
ethtool -C $iface rx-usecs 1000
sysctl net.core.busy_poll=1000
ATR disabled:
ethtool -K $iface ntuple on
Workload:
Modified memcached that changes the thread selection policy to be based
on the incoming rx-queue of a connection using SO_INCOMING_NAPI_ID socket
option. The default is round-robin.
Default: No rxqs_map configured
Symmetric queues: Enable rxqs_map for all queues 1:1 mapped to Tx queue
System:
Architecture: x86_64
CPU(s): 72
Model name: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz
16 threads 400K requests/sec
=============================
-------------------------------------------------------------------------------
Default Symmetric queues
-------------------------------------------------------------------------------
RTT min/avg/max 4/51/2215 2/30/5163
(usec)
intr/sec 26655 18606
contextswitch/sec 5145 4044
insn per cycle 0.43 0.72
cache-misses 6.919 4.310
(% of all cache refs)
L1-dcache-load- 4.49 3.29
-misses
(% of all L1-dcache hits)
LLC-load-misses 13.26 8.96
(% of all LL-cache hits)
-------------------------------------------------------------------------------
32 threads 400K requests/sec
=============================
-------------------------------------------------------------------------------
Default Symmetric queues
-------------------------------------------------------------------------------
RTT min/avg/max 10/112/5562 9/46/4637
(usec)
intr/sec 30456 27666
contextswitch/sec 7552 5133
insn per cycle 0.41 0.49
cache-misses 9.357 2.769
(% of all cache refs)
L1-dcache-load- 4.09 3.98
-misses
(% of all L1-dcache hits)
LLC-load-misses 12.96 3.96
(% of all LL-cache hits)
-------------------------------------------------------------------------------
16 threads 800K requests/sec
=============================
-------------------------------------------------------------------------------
Default Symmetric queues
-------------------------------------------------------------------------------
RTT min/avg/max 5/151/4989 9/69/2611
(usec)
intr/sec 35686 22907
contextswitch/sec 25522 12281
insn per cycle 0.67 0.74
cache-misses 8.652 6.38
(% of all cache refs)
L1-dcache-load- 3.19 2.86
-misses
(% of all L1-dcache hits)
LLC-load-misses 16.53 11.99
(% of all LL-cache hits)
-------------------------------------------------------------------------------
32 threads 800K requests/sec
=============================
-------------------------------------------------------------------------------
Default Symmetric queues
-------------------------------------------------------------------------------
RTT min/avg/max 6/163/6152 8/88/4209
(usec)
intr/sec 47079 26548
contextswitch/sec 42190 39168
insn per cycle 0.45 0.54
cache-misses 8.798 4.668
(% of all cache refs)
L1-dcache-load- 6.55 6.29
-misses
(% of all L1-dcache hits)
LLC-load-misses 13.91 10.44
(% of all LL-cache hits)
-------------------------------------------------------------------------------
v4:
- Removed enum for map types and used boolean to identify rxqs_map vs cpus_map.
- Added comments for helper functions.
- Added another static_key for rxqs_map (xps_rxqs_needed).
- New patch to change tx_queue_mapping in sock_common to unsigned short.
- Separated marking receive queue number into a standalone patch.
- Changed wording in documentation (queue-pair to queue-association)
---
Amritha Nambiar (7):
net: Refactor XPS for CPUs and Rx queues
net: Use static_key for XPS maps
net: sock: Change tx_queue_mapping in sock_common to unsigned short
net: Record receive queue number for a connection
net: Enable Tx queue selection based on Rx queues
net-sysfs: Add interface for Rx queue(s) map per Tx queue
Documentation: Add explanation for XPS using Rx-queue(s) map
Documentation/ABI/testing/sysfs-class-net-queues | 11 +
Documentation/networking/scaling.txt | 57 ++++
include/linux/cpumask.h | 11 +
include/linux/netdevice.h | 100 ++++++++
include/net/busy_poll.h | 1
include/net/sock.h | 28 ++
net/core/dev.c | 283 +++++++++++++++-------
net/core/net-sysfs.c | 85 ++++++-
net/core/sock.c | 4
net/ipv4/tcp_input.c | 3
10 files changed, 474 insertions(+), 109 deletions(-)
--
Powered by blists - more mailing lists