lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200203151536.caf6n4b2ymvtssmh@tux>
Date:   Mon, 3 Feb 2020 16:15:36 +0100
From:   Max Neunhoeffer <max@...ngodb.com>
To:     Jakub Kicinski <kuba@...nel.org>
Cc:     netdev@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        LKML <linux-kernel@...r.kernel.org>,
        Roman Penyaev <rpenyaev@...e.de>,
        Christopher Kohlhoff <chris.kohlhoff@...arpool.io>
Subject: Re: epoll_wait misses edge-triggered eventfd events: bug in Linux
 5.3 and 5.4

Dear Jakub and all,

I have done a git bisect and found that this commit introduced the epoll
bug:

https://github.com/torvalds/linux/commit/a218cc4914209ac14476cb32769b31a556355b22

I Cc the author of the commit.

This makes sense, since the commit introduces a new rwlock to reduce
contention in ep_poll_callback. I do not fully understand the details
but this sounds all very close to this bug.

I have also verified that the bug is still present in the latest master
branch in Linus' repository.

Furthermore, Chris Kohlhoff has provided yet another reproducing program
which is no longer using edge-triggered but standard level-triggered
events and epoll_wait. This makes the bug all the more urgent, since
potentially more programs could run into this problem and could end up
with sleeping barbers.

I have added all the details to the bugzilla bugreport:

  https://bugzilla.kernel.org/show_bug.cgi?id=205933

Hopefully, we can resolve this now equipped with this amount of information.

Best regards,
  Max.

On 20/02/01 12:16, Jakub Kicinski wrote:
> On Fri, 31 Jan 2020 14:57:30 +0100, Max Neunhoeffer wrote:
> > Dear All,
> > 
> > I believe I have found a bug in Linux 5.3 and 5.4 in epoll_wait/epoll_ctl
> > when an eventfd together with edge-triggered or the EPOLLONESHOT policy
> > is used. If an epoll_ctl call to rearm the eventfd happens approximately
> > at the same time as the epoll_wait goes to sleep, the event can be lost, 
> > even though proper protection through a mutex is employed.
> > 
> > The details together with two programs showing the problem can be found
> > here:
> > 
> >   https://bugzilla.kernel.org/show_bug.cgi?id=205933
> > 
> > Older kernels seem not to have this problem, although I did not test all
> > versions. I know that 4.15 and 5.0 do not show the problem.
> > 
> > Note that this method of using epoll_wait/eventfd is used by
> > boost::asio to wake up event loops in case a new completion handler
> > is posted to an io_service, so this is probably relevant for many
> > applications.
> > 
> > Any help with this would be appreciated.
> 
> Could be networking related but let's CC FS folks just in case.
> 
> Would you be able to perform bisection to narrow down the search 
> for a buggy change?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ