[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACSApvZT5F4F36jLWEc5v_AbqZVQpmH1W7UK21tB9nPu=OtmBA@mail.gmail.com>
Date: Fri, 18 Jul 2025 09:38:27 +0100
From: Soheil Hassas Yeganeh <soheil@...gle.com>
To: Nam Cao <namcao@...utronix.de>
Cc: Alexander Viro <viro@...iv.linux.org.uk>, Christian Brauner <brauner@...nel.org>, Jan Kara <jack@...e.cz>,
Shuah Khan <shuah@...nel.org>, Davidlohr Bueso <dave@...olabs.net>,
Khazhismel Kumykov <khazhy@...gle.com>, Willem de Bruijn <willemb@...gle.com>,
Eric Dumazet <edumazet@...gle.com>, Jens Axboe <axboe@...nel.dk>, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org,
stable@...r.kernel.org
Subject: Re: [PATCH 2/2] eventpoll: Fix epoll_wait() report false negative
On Fri, Jul 18, 2025 at 8:52 AM Nam Cao <namcao@...utronix.de> wrote:
>
> ep_events_available() checks for available events by looking at ep->rdllist
> and ep->ovflist. However, this is done without a lock, therefore the
> returned value is not reliable. Because it is possible that both checks on
> ep->rdllist and ep->ovflist are false while ep_start_scan() or
> ep_done_scan() is being executed on other CPUs, despite events are
> available.
>
> This bug can be observed by:
>
> 1. Create an eventpoll with at least one ready level-triggered event
>
> 2. Create multiple threads who do epoll_wait() with zero timeout. The
> threads do not consume the events, therefore all epoll_wait() should
> return at least one event.
>
> If one thread is executing ep_events_available() while another thread is
> executing ep_start_scan() or ep_done_scan(), epoll_wait() may wrongly
> return no event for the former thread.
That is the whole point of epoll_wait with a zero timeout. We would want to
opportunistically poll without much overhead, which will have more
false positives.
A caller that calls with a zero timeout should retry later, and will
at some point
observe the event.
I'm not sure if we would want to add much more overheads, for higher precision.
Thanks,
Soheil
> This reproducer is implemented as TEST(epoll65) in
> tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c
>
> Fix it by skipping ep_events_available(), just call ep_try_send_events()
> directly.
>
> epoll_sendevents() (io_uring) suffers the same problem, fix that as well.
>
> There is still ep_busy_loop() who uses ep_events_available() without lock,
> but it is probably okay (?) for busy-polling.
>
> Fixes: c5a282e9635e ("fs/epoll: reduce the scope of wq lock in epoll_wait()")
> Fixes: e59d3c64cba6 ("epoll: eliminate unnecessary lock for zero timeout")
> Fixes: ae3a4f1fdc2c ("eventpoll: add epoll_sendevents() helper")
> Signed-off-by: Nam Cao <namcao@...utronix.de>
> Cc: stable@...r.kernel.org
> ---
> fs/eventpoll.c | 16 ++--------------
> 1 file changed, 2 insertions(+), 14 deletions(-)
>
> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> index 0fbf5dfedb24..541481eafc20 100644
> --- a/fs/eventpoll.c
> +++ b/fs/eventpoll.c
> @@ -2022,7 +2022,7 @@ static int ep_schedule_timeout(ktime_t *to)
> static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
> int maxevents, struct timespec64 *timeout)
> {
> - int res, eavail, timed_out = 0;
> + int res, eavail = 1, timed_out = 0;
> u64 slack = 0;
> wait_queue_entry_t wait;
> ktime_t expires, *to = NULL;
> @@ -2041,16 +2041,6 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
> timed_out = 1;
> }
>
> - /*
> - * This call is racy: We may or may not see events that are being added
> - * to the ready list under the lock (e.g., in IRQ callbacks). For cases
> - * with a non-zero timeout, this thread will check the ready list under
> - * lock and will add to the wait queue. For cases with a zero
> - * timeout, the user by definition should not care and will have to
> - * recheck again.
> - */
> - eavail = ep_events_available(ep);
> -
> while (1) {
> if (eavail) {
> res = ep_try_send_events(ep, events, maxevents);
> @@ -2496,9 +2486,7 @@ int epoll_sendevents(struct file *file, struct epoll_event __user *events,
> * Racy call, but that's ok - it should get retried based on
> * poll readiness anyway.
> */
> - if (ep_events_available(ep))
> - return ep_try_send_events(ep, events, maxevents);
> - return 0;
> + return ep_try_send_events(ep, events, maxevents);
> }
>
> /*
> --
> 2.39.5
>
Powered by blists - more mailing lists