lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 6 Jul 2019 18:16:57 +0200
From:   Carlo Wood <carlo@...noe.com>
To:     davem@...emloft.net, netdev@...r.kernel.org
Subject: Re: Kernel BUG: epoll_wait() (and epoll_pwait) stall for 206 ms per
 call on sockets with a small-ish snd/rcv buffer.

I improved the test case a bit:

https://github.com/CarloWood/ai-evio-testsuite/blob/2a9ae49e3ae39eea7cb1d105883254370f53831b/src/epoll_bug.c

If the bug doesn't show, please increase burst_size and/or decrease
sndbuf_size and rcvbuf_size.

The output that I get with VERBOSE defined is for example:

[...snip...]
wrote 34784 bytes to 6 (total written now 9665792), now in pipe line: 34784
Read 34784 bytes from fd 5 (total read now 9665792), left in pipe line: 0
wrote 34784 bytes to 6 (total written now 9700576), now in pipe line: 34784
Read 34784 bytes from fd 5 (total read now 9700576), left in pipe line: 0
wrote 34784 bytes to 6 (total written now 9735360), now in pipe line: 34784
Read 34784 bytes from fd 5 (total read now 9735360), left in pipe line: 0
wrote 34784 bytes to 6 (total written now 9770144), now in pipe line: 34784
Read 34784 bytes from fd 5 (total read now 9770144), left in pipe line: 0
wrote 34784 bytes to 6 (total written now 9804928), now in pipe line: 34784
Read 34784 bytes from fd 5 (total read now 9804928), left in pipe line: 0
wrote 34784 bytes to 6 (total written now 9839712), now in pipe line: 34784
Read 34784 bytes from fd 5 (total read now 9839712), left in pipe line: 0
wrote 34784 bytes to 6 (total written now 9874496), now in pipe line: 34784
Read 34784 bytes from fd 5 (total read now 9874496), left in pipe line: 0
wrote 34784 bytes to 6 (total written now 9909280), now in pipe line: 34784
Read 34784 bytes from fd 5 (total read now 9909280), left in pipe line: 0
wrote 34784 bytes to 6 (total written now 9944064), now in pipe line: 34784
Read 34784 bytes from fd 5 (total read now 9944064), left in pipe line: 0
wrote 34784 bytes to 6 (total written now 9978848), now in pipe line: 34784
Read 34784 bytes from fd 5 (total read now 9978848), left in pipe line: 0
wrote 21152 bytes to 6 (total written now 10000000), now in pipe line: 21152
Read 21152 bytes from fd 5 (total read now 10000000), left in pipe line: 0
epoll_wait() stalled 193 milliseconds!
Read 21888 bytes from fd 6 (total read now 70912), left in pipe line: 21888
epoll_wait() stalled 255 milliseconds!
Read 21888 bytes from fd 6 (total read now 92800), left in pipe line: 0
write(5, buf, 9907200) = 43776 (total written now 136576), now in pipe line: 43776
epoll_wait() stalled 211 milliseconds!
Read 21888 bytes from fd 6 (total read now 114688), left in pipe line: 21888
epoll_wait() stalled 207 milliseconds!
write(5, buf, 9863424) = 38272 (total written now 174848), now in pipe line: 60160
Read 16384 bytes from fd 6 (total read now 131072), left in pipe line: 43776
epoll_wait() stalled 207 milliseconds!
Read 21888 bytes from fd 6 (total read now 152960), left in pipe line: 21888
epoll_wait() stalled 207 milliseconds!
Read 21888 bytes from fd 6 (total read now 174848), left in pipe line: 0
write(5, buf, 9825152) = 43776 (total written now 218624), now in pipe line: 43776
epoll_wait() stalled 207 milliseconds!
Read 21888 bytes from fd 6 (total read now 196736), left in pipe line: 21888
epoll_wait() stalled 211 milliseconds!
Read 21888 bytes from fd 6 (total read now 218624), left in pipe line: 0
write(5, buf, 9781376) = 43776 (total written now 262400), now in pipe line: 43776
epoll_wait() stalled 207 milliseconds!
Read 21888 bytes from fd 6 (total read now 240512), left in pipe line: 21888
epoll_wait() stalled 207 milliseconds!
Read 21888 bytes from fd 6 (total read now 262400), left in pipe line: 0
write(5, buf, 9737600) = 43776 (total written now 306176), now in pipe line: 43776
epoll_wait() stalled 207 milliseconds!
Read 21888 bytes from fd 6 (total read now 284288), left in pipe line: 21888
epoll_wait() stalled 207 milliseconds!
Read 21888 bytes from fd 6 (total read now 306176), left in pipe line: 0
write(5, buf, 9693824) = 43776 (total written now 349952), now in pipe line: 43776
epoll_wait() stalled 207 milliseconds!
Read 21888 bytes from fd 6 (total read now 328064), left in pipe line: 21888
epoll_wait() stalled 207 milliseconds!
write(5, buf, 9650048) = 38272 (total written now 388224), now in pipe line: 60160
Read 16384 bytes from fd 6 (total read now 344448), left in pipe line: 43776
epoll_wait() stalled 207 milliseconds!
... etc. etc.


It seems that the problem always occur right after stopping to write data in one
direction, and then happens for the way back.

In the case above the burst_size is set to 10000000 bytes, and it writes
that amount and reads it on the other side successfully.

What I think is going on however is that the data on the way back is stalling,
during which the "forward" burst finishes (without the epoll_wait stalling it
is VERY fast). Above you see:

Read 21152 bytes from fd 5 (total read now 10000000), left in pipe line: 0
epoll_wait() stalled 193 milliseconds!

But since we know that the stall always seems to be 207ms, I'm pretty sure
that actually it stalled 14 ms before that, and needed 14 ms to finish the
complete burst in one direction.

-- 
Carlo Wood <carlo@...noe.com>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ