[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20240124025359.11419-1-jdamato@fastly.com>
Date: Wed, 24 Jan 2024 02:53:56 +0000
From: Joe Damato <jdamato@...tly.com>
To: netdev@...r.kernel.org,
	linux-kernel@...r.kernel.org
Cc: chuck.lever@...cle.com,
	jlayton@...nel.org,
	linux-api@...r.kernel.org,
	brauner@...nel.org,
	edumazet@...gle.com,
	davem@...emloft.net,
	alexander.duyck@...il.com,
	sridhar.samudrala@...el.com,
	kuba@...nel.org,
	Joe Damato <jdamato@...tly.com>
Subject: [net-next 0/3] Per epoll context busy poll support
Greetings:
TL;DR This builds on commit bf3b9f6372c4 ("epoll: Add busy poll support to
epoll with socket fds.") by allowing user applications to enable
epoll-based busy polling and set a busy poll packet budget on a per epoll
context basis.
To allow for this, two ioctls have been added for epoll contexts for
getting and setting a new struct, struct epoll_params.
This makes epoll-based busy polling much more usable for user
applications than the current system-wide sysctl and hardcoded budget.
Longer explanation:
Presently epoll has support for a very useful form of busy poll based on
the incoming NAPI ID (see also: SO_INCOMING_NAPI_ID [1]).
This form of busy poll allows epoll_wait to drive NAPI packet processing
which allows for a few interesting user application designs which can
reduce latency and also potentially improve L2/L3 cache hit rates by
deferring NAPI until userland has finished its work.
The documentation available on this is, IMHO, a bit confusing so please
allow me to explain how one might use this:
1. Ensure each application thread has its own epoll instance mapping
1-to-1 with NIC RX queues. An n-tuple filter would likely be used to
direct connections with specific dest ports to these queues.
2. Optionally: Setup IRQ coalescing for the NIC RX queues where busy
polling will occur. This can help avoid the userland app from being
pre-empted by a hard IRQ while userland is running. Note this means that
userland must take care to call epoll_wait and not take too long in
userland since it now drives NAPI via epoll_wait.
3. Ensure that all incoming connections added to an epoll instance
have the same NAPI ID. This can be done with a BPF filter when
SO_REUSEPORT is used or getsockopt + SO_INCOMING_NAPI_ID when a single
accept thread is used which dispatches incoming connections to threads.
4. Lastly, busy poll must be enabled via a sysctl
(/proc/sys/net/core/busy_poll).
The unfortunate part about step 4 above is that this enables busy poll
system-wide which affects all user applications on the system,
including epoll-based network applications which were not intended to
be used this way or applications where increased CPU usage for lower
latency network processing is unnecessary or not desirable.
If the user wants to run one low latency epoll-based server application
with epoll-based busy poll, but would like to run the rest of the
applications on the system (which may also use epoll) without busy poll,
this system-wide sysctl presents a significant problem.
This change preserves the system-wide sysctl, but adds a mechanism (via
ioctl) to enable or disable busy poll for epoll contexts as needed by
individual applications, making epoll-based busy poll more usable.
Thanks,
Joe
[1]: https://lore.kernel.org/lkml/20170324170836.15226.87178.stgit@localhost.localdomain/
Joe Damato (3):
  eventpoll: support busy poll per epoll instance
  eventpoll: Add per-epoll busy poll packet budget
  eventpoll: Add epoll ioctl for epoll_params
 .../userspace-api/ioctl/ioctl-number.rst      |  1 +
 fs/eventpoll.c                                | 99 ++++++++++++++++++-
 include/uapi/linux/eventpoll.h                | 12 +++
 3 files changed, 107 insertions(+), 5 deletions(-)
-- 
2.25.1
Powered by blists - more mailing lists
 
