[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAywjhT6p9hGNC+VurGvi=jHq+7saKeEMpdxVuQvpFAUosx4=A@mail.gmail.com>
Date: Tue, 4 Feb 2025 16:14:41 -0800
From: Samiullah Khawaja <skhawaja@...gle.com>
To: Jakub Kicinski <kuba@...nel.org>, "David S . Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>, almasrymina@...gle.com
Cc: netdev@...r.kernel.org, Joe Damato <jdamato@...tly.com>,
Martin Karsten <mkarsten@...terloo.ca>
Subject: Re: [PATCH net-next v3 0/4] Add support to do threaded napi busy poll
On Tue, Feb 4, 2025 at 4:10 PM Samiullah Khawaja <skhawaja@...gle.com> wrote:
>
> Extend the already existing support of threaded napi poll to do continuous
> busy polling.
>
> This is used for doing continuous polling of napi to fetch descriptors
> from backing RX/TX queues for low latency applications. Allow enabling
> of threaded busypoll using netlink so this can be enabled on a set of
> dedicated napis for low latency applications.
>
> It allows enabling NAPI busy poll for any userspace application
> indepdendent of userspace API being used for packet and event processing
> (epoll, io_uring, raw socket APIs). Once enabled user can fetch the PID
> of the kthread doing NAPI polling and set affinity, priority and
> scheduler for it depending on the low-latency requirements.
>
> Currently threaded napi is only enabled at device level using sysfs. Add
> support to enable/disable threaded mode for a napi individually. This
> can be done using the netlink interface. Extend `napi-set` op in netlink
> spec that allows setting the `threaded` attribute of a napi.
>
> Extend the threaded attribute in napi struct to add an option to enable
> continuous busy polling. Extend the netlink and sysfs interface to allow
> enabled/disabling threaded busypolling at device or individual napi
> level.
>
> We use this for our AF_XDP based hard low-latency usecase using onload
> stack (https://github.com/Xilinx-CNS/onload) that runs in userspace. Our
> usecase is a fixed frequency RPC style traffic with fixed
> request/response size. We simulated this using neper by only starting
> next transaction when last one has completed. The experiment results are
> listed below,
>
> Setup:
>
> - Running on Google C3 VMs with idpf driver with following configurations.
> - IRQ affinity and coalascing is common for both experiments.
> - There is only 1 RX/TX queue configured.
> - First experiment enables busy poll using sysctl for both epoll and
> socket APIs.
> - Second experiment enables NAPI threaded busy poll for the full device
> using sysctl.
>
> Non threaded NAPI busy poll enabled using sysctl.
> ```
> echo 400 | sudo tee /proc/sys/net/core/busy_poll
> echo 400 | sudo tee /proc/sys/net/core/busy_read
> echo 2 | sudo tee /sys/class/net/eth0/napi_defer_hard_irqs
> echo 15000 | sudo tee /sys/class/net/eth0/gro_flush_timeout
> ```
>
> Results using following command,
> ```
> sudo EF_NO_FAIL=0 EF_POLL_USEC=100000 taskset -c 3-10 onload -v \
> --profile=latency ./neper/tcp_rr -Q 200 -R 400 -T 1 -F 50 \
> -p 50,90,99,999 -H <IP> -l 10
>
> ...
> ...
>
> num_transactions=2835
> latency_min=0.000018976
> latency_max=0.049642100
> latency_mean=0.003243618
> latency_stddev=0.010636847
> latency_p50=0.000025270
> latency_p90=0.005406710
> latency_p99=0.049807350
> latency_p99.9=0.049807350
> ```
>
> Results with napi threaded busy poll using following command,
> ```
> sudo EF_NO_FAIL=0 EF_POLL_USEC=100000 taskset -c 3-10 onload -v \
> --profile=latency ./neper/tcp_rr -Q 200 -R 400 -T 1 -F 50 \
> -p 50,90,99,999 -H <IP> -l 10
>
> ...
> ...
>
> num_transactions=460163
> latency_min=0.000015707
> latency_max=0.200182942
> latency_mean=0.000019453
> latency_stddev=0.000720727
> latency_p50=0.000016950
> latency_p90=0.000017270
> latency_p99=0.000018710
> latency_p99.9=0.000020150
> ```
>
> Here with NAPI threaded busy poll in a separate core, we are able to
> consistently poll the NAPI to keep latency to absolute minimum. And also
> we are able to do this without any major changes to the onload stack and
> threading model.
>
> v3:
> - Fixed calls to dev_set_threaded in drivers
>
> v2:
> - Add documentation in napi.rst.
> - Provide experiment data and usecase details.
> - Update busy_poller selftest to include napi threaded poll testcase.
> - Define threaded mode enum in netlink interface.
> - Included NAPI threaded state in napi config to save/restore.
>
> Samiullah Khawaja (4):
> Add support to set napi threaded for individual napi
> net: Create separate gro_flush helper function
> Extend napi threaded polling to allow kthread based busy polling
> selftests: Add napi threaded busy poll test in `busy_poller`
>
> Documentation/ABI/testing/sysfs-class-net | 3 +-
> Documentation/netlink/specs/netdev.yaml | 14 ++
> Documentation/networking/napi.rst | 80 ++++++++++-
> .../net/ethernet/atheros/atl1c/atl1c_main.c | 2 +-
> drivers/net/ethernet/mellanox/mlxsw/pci.c | 2 +-
> drivers/net/ethernet/renesas/ravb_main.c | 2 +-
> drivers/net/wireless/ath/ath10k/snoc.c | 2 +-
> include/linux/netdevice.h | 24 +++-
> include/uapi/linux/netdev.h | 7 +
> net/core/dev.c | 127 ++++++++++++++----
> net/core/net-sysfs.c | 2 +-
> net/core/netdev-genl-gen.c | 5 +-
> net/core/netdev-genl.c | 9 ++
> tools/include/uapi/linux/netdev.h | 7 +
> tools/testing/selftests/net/busy_poll_test.sh | 25 +++-
> tools/testing/selftests/net/busy_poller.c | 14 +-
> 16 files changed, 285 insertions(+), 40 deletions(-)
>
> --
> 2.48.1.362.g079036d154-goog
>
Adding Joe and Martin as they requested to be CC'd in the next
revision. It seems I missed them when sending this out :(.
Powered by blists - more mailing lists