[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250123231236.2657321-1-skhawaja@google.com>
Date: Thu, 23 Jan 2025 23:12:32 +0000
From: Samiullah Khawaja <skhawaja@...gle.com>
To: Jakub Kicinski <kuba@...nel.org>, "David S . Miller " <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>, almasrymina@...gle.com
Cc: netdev@...r.kernel.org, skhawaja@...gle.com
Subject: [PATCH net-next v2 0/4] Add support to do threaded napi busy poll
Extend the already existing support of threaded napi poll to do continuous
busy polling.
This is used for doing continuous polling of napi to fetch descriptors from
backing RX/TX queues for low latency applications. Allow enabling of threaded
busypoll using netlink so this can be enabled on a set of dedicated napis for
low latency applications.
It allows enabling NAPI busy poll for any userspace application
indepdendent of userspace API being used for packet and event processing
(epoll, io_uring, raw socket APIs). Once enabled user can fetch the PID
of the kthread doing NAPI polling and set affinity, priority and
scheduler for it depending on the low-latency requirements.
Currently threaded napi is only enabled at device level using sysfs. Add
support to enable/disable threaded mode for a napi individually. This can be
done using the netlink interface. Extend `napi-set` op in netlink spec that
allows setting the `threaded` attribute of a napi.
Extend the threaded attribute in napi struct to add an option to enable
continuous busy polling. Extend the netlink and sysfs interface to allow
enabled/disabling threaded busypolling at device or individual napi level.
We use this for our AF_XDP based hard low-latency usecase using onload
stack (https://github.com/Xilinx-CNS/onload) that runs in userspace. Our
usecase is a fixed frequency RPC style traffic with fixed
request/response size. We simulated this using neper by only starting
next transaction when last one has completed. The experiment results are
listed below,
Setup:
- Running on Google C3 VMs with idpf driver with following configurations.
- IRQ affinity and coalascing is common for both experiments.
- There is only 1 RX/TX queue configured.
- First experiment enables busy poll using sysctl for both epoll and
socket APIs.
- Second experiment enables NAPI threaded busy poll for the full device
using sysctl.
Non threaded NAPI busy poll enabled using sysctl.
```
echo 400 | sudo tee /proc/sys/net/core/busy_poll
echo 400 | sudo tee /proc/sys/net/core/busy_read
echo 2 | sudo tee /sys/class/net/eth0/napi_defer_hard_irqs
echo 15000 | sudo tee /sys/class/net/eth0/gro_flush_timeout
```
Results using following command,
```
sudo EF_NO_FAIL=0 EF_POLL_USEC=100000 taskset -c 3-10 onload -v \
--profile=latency ./neper/tcp_rr -Q 200 -R 400 -T 1 -F 50 \
-p 50,90,99,999 -H <IP> -l 10
...
...
num_transactions=2835
latency_min=0.000018976
latency_max=0.049642100
latency_mean=0.003243618
latency_stddev=0.010636847
latency_p50=0.000025270
latency_p90=0.005406710
latency_p99=0.049807350
latency_p99.9=0.049807350
```
Results with napi threaded busy poll using following command,
```
sudo EF_NO_FAIL=0 EF_POLL_USEC=100000 taskset -c 3-10 onload -v \
--profile=latency ./neper/tcp_rr -Q 200 -R 400 -T 1 -F 50 \
-p 50,90,99,999 -H <IP> -l 10
...
...
num_transactions=460163
latency_min=0.000015707
latency_max=0.200182942
latency_mean=0.000019453
latency_stddev=0.000720727
latency_p50=0.000016950
latency_p90=0.000017270
latency_p99=0.000018710
latency_p99.9=0.000020150
```
Here with NAPI threaded busy poll in a separate core, we are able to
consistently poll the NAPI to keep latency to absolute minimum. And also
we are able to do this without any major changes to the onload stack and
threading model.
v2:
- Add documentation in napi.rst.
- Provide experiment data and usecase details.
- Update busy_poller selftest to include napi threaded poll testcase.
- Define threaded mode enum in netlink interface.
- Included NAPI threaded state in napi config to save/restore.
Samiullah Khawaja (4):
Add support to set napi threaded for individual napi
net: Create separate gro_flush helper function
Extend napi threaded polling to allow kthread based busy polling
selftests: Add napi threaded busy poll test in `busy_poller`
Documentation/ABI/testing/sysfs-class-net | 3 +-
Documentation/netlink/specs/netdev.yaml | 14 ++
Documentation/networking/napi.rst | 80 ++++++++++-
.../net/ethernet/atheros/atl1c/atl1c_main.c | 2 +-
include/linux/netdevice.h | 24 +++-
include/uapi/linux/netdev.h | 7 +
net/core/dev.c | 127 ++++++++++++++----
net/core/net-sysfs.c | 2 +-
net/core/netdev-genl-gen.c | 5 +-
net/core/netdev-genl.c | 9 ++
tools/include/uapi/linux/netdev.h | 7 +
tools/testing/selftests/net/busy_poll_test.sh | 25 +++-
tools/testing/selftests/net/busy_poller.c | 14 +-
13 files changed, 282 insertions(+), 37 deletions(-)
--
2.48.1.262.g85cc9f2d1e-goog
Powered by blists - more mailing lists