lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <794043b6-4008-448e-b241-1390aa91d2ab@gmail.com>
Date: Fri, 31 Jan 2025 13:54:25 +0000
From: Pavel Begunkov <asml.silence@...il.com>
To: Max Kellermann <max.kellermann@...os.com>, axboe@...nel.dk,
 io-uring@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 8/8] io_uring: skip redundant poll wakeups

On 1/28/25 13:39, Max Kellermann wrote:
> Using io_uring with epoll is very expensive because every completion
> leads to a __wake_up() call, most of which are unnecessary because the
> polling process has already been woken up but has not had a chance to
> process the completions.  During this time, wq_has_sleeper() still
> returns true, therefore this check is not enough.

The poller is not required to call vfs_poll / io_uring_poll()
multiple times, in which case all subsequent events will be dropped
on the floor. E.g. the poller queues a poll entry in the first
io_uring_poll(), and then every time there is an event it'd do
vfs_read() or whatever without removing the entry.

I don't think we can make such assumptions about the poller, at
least without some help from it / special casing particular
callbacks.

...
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index 137c2066c5a3..b65efd07e9f0 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
...
> @@ -2793,6 +2794,9 @@ static __poll_t io_uring_poll(struct file *file, poll_table *wait)
>   
>   	if (unlikely(!ctx->poll_activated))
>   		io_activate_pollwq(ctx);
> +
> +	atomic_set(&ctx->poll_wq_waiting, 1);

io_uring_poll()                 |
     poll_wq_waiting = 1         |
                                 | io_poll_wq_wake()
                                 |     poll_wq_waiting = 0
                                 |     // no waiters yet => no wake ups
                                 | <return to user space>
                                 |    <consume all cqes>
     poll_wait()                 |
     return; // no events        |
                                 | produce_cqes()
                                 | io_poll_wq_wake()
                                 |     if (poll_wq_waiting) wake();
                                 |     // it's still 0, wake up is lost


>   	/*
>   	 * provides mb() which pairs with barrier from wq_has_sleeper
>   	 * call in io_commit_cqring
> diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
> index f65e3f3ede51..186cee066f9f 100644
> --- a/io_uring/io_uring.h
> +++ b/io_uring/io_uring.h
> @@ -287,7 +287,7 @@ static inline void io_commit_cqring(struct io_ring_ctx *ctx)
>   
>   static inline void io_poll_wq_wake(struct io_ring_ctx *ctx)
>   {
> -	if (wq_has_sleeper(&ctx->poll_wq))
> +	if (wq_has_sleeper(&ctx->poll_wq) && atomic_xchg_release(&ctx->poll_wq_waiting, 0) > 0)
>   		__wake_up(&ctx->poll_wq, TASK_NORMAL, 0,
>   				poll_to_key(EPOLL_URING_WAKE | EPOLLIN));
>   }

-- 
Pavel Begunkov


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ