[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250416185826.26375-1-jdamato@fastly.com>
Date: Wed, 16 Apr 2025 18:58:25 +0000
From: Joe Damato <jdamato@...tly.com>
To: linux-fsdevel@...r.kernel.org
Cc: jack@...e.cz,
brauner@...nel.org,
Joe Damato <jdamato@...tly.com>,
Alexander Viro <viro@...iv.linux.org.uk>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Sridhar Samudrala <sridhar.samudrala@...el.com>,
Alexander Duyck <alexander.h.duyck@...el.com>,
linux-kernel@...r.kernel.org (open list)
Subject: [PATCH vfs/vfs.fixes v2] eventpoll: Set epoll timeout if it's in the future
Avoid an edge case where epoll_wait arms a timer and calls schedule()
even if the timer will expire immediately.
For example: if the user has specified an epoll busy poll usecs which is
equal or larger than the epoll_wait/epoll_pwait2 timeout, it is
unnecessary to call schedule_hrtimeout_range; the busy poll usecs have
consumed the entire timeout duration so it is unnecessary to induce
scheduling latency by calling schedule() (via schedule_hrtimeout_range).
This can be measured using a simple bpftrace script:
tracepoint:sched:sched_switch
/ args->prev_pid == $1 /
{
print(kstack());
print(ustack());
}
Before this patch is applied:
Testing an epoll_wait app with busy poll usecs set to 1000, and
epoll_wait timeout set to 1ms using the script above shows:
__traceiter_sched_switch+69
__schedule+1495
schedule+32
schedule_hrtimeout_range+159
do_epoll_wait+1424
__x64_sys_epoll_wait+97
do_syscall_64+95
entry_SYSCALL_64_after_hwframe+118
epoll_wait+82
Which is unexpected; the busy poll usecs should have consumed the
entire timeout and there should be no reason to arm a timer.
After this patch is applied: the same test scenario does not generate a
call to schedule() in the above edge case. If the busy poll usecs are
reduced (for example usecs: 100, epoll_wait timeout 1ms) the timer is
armed as expected.
Fixes: bf3b9f6372c4 ("epoll: Add busy poll support to epoll with socket fds.")
Signed-off-by: Joe Damato <jdamato@...tly.com>
Reviewed-by: Jan Kara <jack@...e.cz>
---
v2:
- No longer an RFC and rebased on vfs/vfs.fixes
- Added Jan's Reviewed-by
- Added Fixes tag
- No functional changes from the RFC
rfcv1: https://lore.kernel.org/linux-fsdevel/20250415184346.39229-1-jdamato@fastly.com/
fs/eventpoll.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 100376863a44..4bc264b854c4 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1996,6 +1996,14 @@ static int ep_try_send_events(struct eventpoll *ep,
return res;
}
+static int ep_schedule_timeout(ktime_t *to)
+{
+ if (to)
+ return ktime_after(*to, ktime_get());
+ else
+ return 1;
+}
+
/**
* ep_poll - Retrieves ready events, and delivers them to the caller-supplied
* event buffer.
@@ -2103,7 +2111,7 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
write_unlock_irq(&ep->lock);
- if (!eavail)
+ if (!eavail && ep_schedule_timeout(to))
timed_out = !schedule_hrtimeout_range(to, slack,
HRTIMER_MODE_ABS);
__set_current_state(TASK_RUNNING);
base-commit: a681b7c17dd21d5aa0da391ceb27a2007ba970a4
--
2.43.0
Powered by blists - more mailing lists