lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Date:   Sat, 8 Apr 2017 16:56:46 -0700
From:   Joseph Lynch <joe.e.lynch@...il.com>
To:     davem@...emloft.net
Cc:     netdev@...r.kernel.org, phil@....cc
Subject: Performance regression on loopback devices when qdiscs are attached
 in 4.4

Hello,

I apologize if this is the wrong list to report this bug to, I did not
find a more specific listing in the maintainers file. I think this is
a kernel issue and not an issue with my distro, but if you disagree I
can re-direct this report as appropriate.

I am upgrading some Linux 4.2 servers to Linux 4.4 (Ubuntu Xenial),
and during testing I'm observing TCP segment re-transmits very
occasionally on the loopback device, leading to 200ms latency spikes.
I don't observe the issues on non loopback devices, and I believe that
I've narrowed it down to an issue with qdiscs on loopback.

It seems that when a queuing discipline other than noqueue is attached
to a loopback device in 4.4+ kernels, packets will (very occasionally)
get dropped completely leading to a re-transmit. I'm not sure how this
can happen, and I've been trying to figure out what's going on, but if
anyone has any pointers or suggestions I'd very much appreciate that.

I've attached the script I'm using to reproduce the bug and an example
ab run that I believe shows the bug. In particular, the max timings of
200ms in the ab output and seeing TCP segment re-transmits (and
sometimes RSTs) in the tcpdump output is indicating the issue to me. I
have tested on 3.13, 4.2 and 4.4 kernels and only 4.4 is showing the
issue. Furthermore non loopback interfaces don't appear to have the
bug. So I ran git diff v4.2..v4.4 drivers/net/loopback.c, and the only
commit that seems to touch loopback.c is e65db2b7. I'm attempting to
revert the change and re-compile to see if that commit triggers the
bug, but I don't understand why that change would be breaking things
in this way so that's just a guess.

I'm continuing to try to debug this, but I figured it would be a good
idea to report it here in case someone with more familiarity may know
what's going on. Please let me know if there is any additional
information I can provide or tests I can run.

Thank you,
-Joey Lynch

View attachment "pfifo.txt" of type "text/plain" (1531 bytes)

Download attachment "repro.sh" of type "application/x-sh" (820 bytes)

View attachment "noqueue.txt" of type "text/plain" (1531 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ