lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 10 Dec 2022 08:36:11 -0700
From:   Jens Axboe <>
To:     Linus Torvalds <>
Cc:     netdev <>,
        "" <>
Subject: [GIT PULL] Add support for epoll min wait time

Hi Linus,

I've had this done for months and posted a few times, but little
attention has been received. Sending it out for inclusion now, as having
it caught up in upstream limbo is preventing further use cases of it at
Meta. We upstream every feature that we develop, and we don't put any
features into our kernel that aren't already upstream, or on the way
upstream. This is obviously especially important when an API is

This adds an epoll_ctl method for setting the minimum wait time for
retrieving events. In production, workloads don't just run mostly idle
or mostly full tilt. A common pattern is medium load. epoll_wait and
friends receive a cap of max events and max time we want to wait for
them, but there's no notion of min events or min time. This leads to
services only getting a single event, even if they are totally fine with
waiting eg 200 usec for more events. More events leads to greater
efficiency in handling them.

The main patch has some numbers, but tldr is that we see a nice
reduction in context switches / second, and a reduction in busy time on
such systems.

It has been suggested that a syscall should be available for this as
well, and there are two main reasons for why this wasn't pursued (but
was still investigated):

- This most likely should've been done as epoll_pwait3(), as we already
  have epoll_wait, epoll_pwait, and epoll_pwait2. The latter two are
  already at the max number of syscall arguments, so a new method would
  have to be done where a struct would define the API. With some
  arguments being optional, this could get inefficient or ugly (or both).

- Main reason is that Meta doesn't need it. By using epoll_ctl, the
  check-for-support-of-feature can be relegated to setup time rather
  than in the fast path, and the workloads we are looking at would not
  need different min wait settings within a single epoll context.

Please pull!

The following changes since commit ef4d3ea40565a781c25847e9cb96c1bd9f462bc6:

  afs: Fix server->active leak in afs_put_server (2022-11-30 10:02:37 -0800)

are available in the Git repository at:

  git:// tags/epoll-min_ts-2022-12-08

for you to fetch changes up to 73b9320234c0ad1b5e6f576abb796221eb088c64:

  eventpoll: ensure we pass back -EBADF for a bad file descriptor (2022-12-08 07:05:42 -0700)


Jens Axboe (8):
      eventpoll: cleanup branches around sleeping for events
      eventpoll: don't pass in 'timed_out' to ep_busy_loop()
      eventpoll: split out wait handling
      eventpoll: move expires to epoll_wq
      eventpoll: move file checking earlier for epoll_ctl()
      eventpoll: add support for min-wait
      eventpoll: add method for configuring minimum wait on epoll context
      eventpoll: ensure we pass back -EBADF for a bad file descriptor

 fs/eventpoll.c                 | 192 +++++++++++++++++++++++++++++++++--------
 include/linux/eventpoll.h      |   2 +-
 include/uapi/linux/eventpoll.h |   1 +
 3 files changed, 158 insertions(+), 37 deletions(-)

Jens Axboe

Powered by blists - more mailing lists