[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <b0901cba-3cb8-a309-701e-7b8cb13f0e8a@kernel.dk>
Date: Sat, 10 Dec 2022 08:36:11 -0700
From: Jens Axboe <axboe@...nel.dk>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: netdev <netdev@...r.kernel.org>,
"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>
Subject: [GIT PULL] Add support for epoll min wait time
Hi Linus,
I've had this done for months and posted a few times, but little
attention has been received. Sending it out for inclusion now, as having
it caught up in upstream limbo is preventing further use cases of it at
Meta. We upstream every feature that we develop, and we don't put any
features into our kernel that aren't already upstream, or on the way
upstream. This is obviously especially important when an API is
involved.
This adds an epoll_ctl method for setting the minimum wait time for
retrieving events. In production, workloads don't just run mostly idle
or mostly full tilt. A common pattern is medium load. epoll_wait and
friends receive a cap of max events and max time we want to wait for
them, but there's no notion of min events or min time. This leads to
services only getting a single event, even if they are totally fine with
waiting eg 200 usec for more events. More events leads to greater
efficiency in handling them.
The main patch has some numbers, but tldr is that we see a nice
reduction in context switches / second, and a reduction in busy time on
such systems.
It has been suggested that a syscall should be available for this as
well, and there are two main reasons for why this wasn't pursued (but
was still investigated):
- This most likely should've been done as epoll_pwait3(), as we already
have epoll_wait, epoll_pwait, and epoll_pwait2. The latter two are
already at the max number of syscall arguments, so a new method would
have to be done where a struct would define the API. With some
arguments being optional, this could get inefficient or ugly (or both).
- Main reason is that Meta doesn't need it. By using epoll_ctl, the
check-for-support-of-feature can be relegated to setup time rather
than in the fast path, and the workloads we are looking at would not
need different min wait settings within a single epoll context.
Please pull!
The following changes since commit ef4d3ea40565a781c25847e9cb96c1bd9f462bc6:
afs: Fix server->active leak in afs_put_server (2022-11-30 10:02:37 -0800)
are available in the Git repository at:
git://git.kernel.dk/linux.git tags/epoll-min_ts-2022-12-08
for you to fetch changes up to 73b9320234c0ad1b5e6f576abb796221eb088c64:
eventpoll: ensure we pass back -EBADF for a bad file descriptor (2022-12-08 07:05:42 -0700)
----------------------------------------------------------------
epoll-min_ts-2022-12-08
----------------------------------------------------------------
Jens Axboe (8):
eventpoll: cleanup branches around sleeping for events
eventpoll: don't pass in 'timed_out' to ep_busy_loop()
eventpoll: split out wait handling
eventpoll: move expires to epoll_wq
eventpoll: move file checking earlier for epoll_ctl()
eventpoll: add support for min-wait
eventpoll: add method for configuring minimum wait on epoll context
eventpoll: ensure we pass back -EBADF for a bad file descriptor
fs/eventpoll.c | 192 +++++++++++++++++++++++++++++++++--------
include/linux/eventpoll.h | 2 +-
include/uapi/linux/eventpoll.h | 1 +
3 files changed, 158 insertions(+), 37 deletions(-)
--
Jens Axboe
Powered by blists - more mailing lists