[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1e50e45cfc832320999f21a81790a060@suse.de>
Date: Tue, 25 Jun 2019 13:07:02 +0200
From: Roman Penyaev <rpenyaev@...e.de>
To: Eric Wong <e@...24.org>
Cc: Jason Baron <jbaron@...mai.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Al Viro <viro@...iv.linux.org.uk>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Peter Zijlstra <peterz@...radead.org>,
Azat Khuzhin <azat@...event.org>,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 00/14] epoll: support pollable epoll from userspace
On 2019-06-25 02:24, Eric Wong wrote:
> Roman Penyaev <rpenyaev@...e.de> wrote:
>> Hi all,
>
> +cc Jason Baron
>
>> ** Limitations
>
> <snip>
>
>> 4. No support for EPOLLEXCLUSIVE
>> If device does not pass pollflags to wake_up() there is no way to
>> call poll() from the context under spinlock, thus special work is
>> scheduled to offload polling. In this specific case we can't
>> support exclusive wakeups, because we do not know actual result
>> of scheduled work and have to wake up every waiter.
>
> Lacking EPOLLEXCLUSIVE support is probably a showstopper for
> common applications using per-task epoll combined with
> non-blocking accept4() (e.g. nginx).
For the 'accept' case it seems SO_REUSEPORT can be used:
https://lwn.net/Articles/542629/
Although I've never tried it in O_NONBLOCK + epoll scenario.
But I've just again dived into this add-wait-exclusive logic and it
seems possible to support EPOLLEXCLUSIVE by iterating over all "epis"
for a particular fd, which has been woken up.
For now I want to leave it as is just not to overcomplicate the code.
> Fwiw, I'm still a weirdo who prefers a dedicated thread doing
> blocking accept4 for distribution between tasks (so epoll never
> sees a listen socket). But, depending on what runtime/language
> I'm using, I can't always dedicate a blocking thread, so I
> recently started using EPOLLEXCLUSIVE from Perl5 where I
> couldn't rely on threads being available.
>
>
> If I could dedicate time to improving epoll; I'd probably
> add writev() support for batching epoll_ctl modifications
> to reduce syscall traffic, or pick-up the kevent()-like interface
> started long ago:
> https://lore.kernel.org/lkml/1393206162-18151-1-git-send-email-n1ght.4nd.d4y@gmail.com/
> (but I'm not sure I want to increase the size of the syscall table).
There is also fresh fs/io_uring.c thingy, which supports polling and
batching (among other IO things). But polling there acts only as a
single-shot, so it might make sense to support there event subscription
instead of resurrecting kevent and co.
--
Roman
Powered by blists - more mailing lists