lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1e50e45cfc832320999f21a81790a060@suse.de>
Date:   Tue, 25 Jun 2019 13:07:02 +0200
From:   Roman Penyaev <rpenyaev@...e.de>
To:     Eric Wong <e@...24.org>
Cc:     Jason Baron <jbaron@...mai.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Al Viro <viro@...iv.linux.org.uk>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Azat Khuzhin <azat@...event.org>,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 00/14] epoll: support pollable epoll from userspace

On 2019-06-25 02:24, Eric Wong wrote:
> Roman Penyaev <rpenyaev@...e.de> wrote:
>> Hi all,
> 
> +cc Jason Baron
> 
>> ** Limitations
> 
> <snip>
> 
>> 4. No support for EPOLLEXCLUSIVE
>>      If device does not pass pollflags to wake_up() there is no way to
>>      call poll() from the context under spinlock, thus special work is
>>      scheduled to offload polling.  In this specific case we can't
>>      support exclusive wakeups, because we do not know actual result
>>      of scheduled work and have to wake up every waiter.
> 
> Lacking EPOLLEXCLUSIVE support is probably a showstopper for
> common applications using per-task epoll combined with
> non-blocking accept4() (e.g. nginx).

For the 'accept' case it seems SO_REUSEPORT can be used:

    https://lwn.net/Articles/542629/

Although I've never tried it in O_NONBLOCK + epoll scenario.

But I've just again dived into this add-wait-exclusive logic and it
seems possible to support EPOLLEXCLUSIVE by iterating over all "epis"
for a particular fd, which has been woken up.

For now I want to leave it as is just not to overcomplicate the code.

> Fwiw, I'm still a weirdo who prefers a dedicated thread doing
> blocking accept4 for distribution between tasks (so epoll never
> sees a listen socket).  But, depending on what runtime/language
> I'm using, I can't always dedicate a blocking thread, so I
> recently started using EPOLLEXCLUSIVE from Perl5 where I
> couldn't rely on threads being available.
> 
> 
> If I could dedicate time to improving epoll; I'd probably
> add writev() support for batching epoll_ctl modifications
> to reduce syscall traffic, or pick-up the kevent()-like interface
> started long ago:
> https://lore.kernel.org/lkml/1393206162-18151-1-git-send-email-n1ght.4nd.d4y@gmail.com/
> (but I'm not sure I want to increase the size of the syscall table).

There is also fresh fs/io_uring.c thingy, which supports polling and
batching (among other IO things).  But polling there acts only as a
single-shot, so it might make sense to support there event subscription
instead of resurrecting kevent and co.

--
Roman






Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ