lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 28 Mar 2020 12:21:55 -0700
From:   Randy Dunlap <rdunlap@...radead.org>
To:     Omar Kilani <omar.kilani@...il.com>, linux-kernel@...r.kernel.org
Cc:     Davidlohr Bueso <dave@...olabs.net>
Subject: Re: Weird issue with epoll and kernel >= 5.0

On 3/28/20 11:10 AM, Omar Kilani wrote:
> Hi there,
> 
> I've observed an issue with epoll and kernels 5.0 and above when a
> system is generating a lot of epoll events.
> 
> I see this issue with nginx and jvm / netty based apps (using the
> jvm's native epoll support as well as netty's own optimized epoll
> support) but *not* with haproxy (?).
> 
> I'm not really sure what the actual problem is (nginx complains about
> epoll_wait with a generic error), but it doesn't happen on 4.19.x and
> lower.
> 
> I thought it was a netty problem at first and opened this ticket:
> 
> https://github.com/netty/netty/issues/8999
> 
> But then saw the same issue in nginx.
> 
> I haven't debugged a kernel issue in something like 20 years so I'm
> not really sure where to start myself.
> 
> I'd be more than happy to provide my test case that has a very quick
> repro to anyone who needs it.

Hi,
Please do.

> Also happy to provide a VM/machine with enough CPUs to trigger it
> easily (it seems to happen quicker with more CPUs present) to test
> with.


There have been around 10 changes in fs/eventpoll.c since v5.0 was
released in March, 2019, so it would be helpful if you could test
the latest mainline kernel to see if the problem is still present.

Hm, it looks like you have identified this commit:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.1-rc5&id=c5a282e9635e9c7382821565083db5d260085e3e
as the/a problem.

I have Cc-ed the patch author also.

-- 
~Randy

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ