netdev - Missing TCP SYN on loopback, retransmits after 1s

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20111122181320.38a70cf8@telperion.jlyo.org>
Date:	Tue, 22 Nov 2011 18:13:20 -0600
From:	Jesse Young <jlyo@...o.org>
To:	netdev@...r.kernel.org
Subject: Missing TCP SYN on loopback, retransmits after 1s

Hi all,

I am experiencing packet loss over TCP/IPv[46], which causes 1 second
delays when connect()ing to a socket. This happens even on loopback, and
on multiple kernels. On the older kernels, the connect() time is nearly
3 seconds, I believe this is due to a recent TCP connect retrasmit
parameter changed in the kernel.

1. Linux dc-s1000-2114 2.6.32-35-server #78-Ubuntu SMP Tue Oct 11
    16:26:12 UTC 2011 x86_64 GNU/Linux
2. Linux dc-a1000-2131.cleversafelabs.com 2.6.39.4-2-clevos+ #1 SMP
    Tue Nov 8 09:06:49 CST 2011 x86_64 x86_64 x86_64 GNU/Linux
3. Linux telperion.jlyo.org 3.1.0-4-ARCH #1 SMP PREEMPT Mon Nov 7
    22:47:18 CET 2011 x86_64 Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
    GenuineIntel GNU/Linux

I have created some test cases which reify this problem, the first set
of tests use select() multiplexing, and have some problems, however,
they exhibit odd behavior as well, especially in the difference between
tcp4 and tcp6.

Please note: these tests will quickly exaust the amount of available
ephemeral TCP ports on your system, which will cause any TCP connect()
calls in other processes to return with EADDRNOTAVAIL. However, ports
will become available after a short while.

The first test fails super quick, while the others haven't timed out
so far.  NOTE: The second test requires /proc/sys/net/ipv6/bindv6only
to be set to 1.

./packetloss :: ::1
./packetloss :: 127.0.0.1
./packetloss 0.0.0.0 127.0.0.1

The other tests run a client and server in different processes.
Run the "close" daemon using one of:
./closed ::
./closed 0.0.0.0

And flood connect() pings against 8009, the port closed listens on.
./tcping -f -p8009 ::1
./tcping -f -p8009 127.0.0.1

Wait for a pause, then ^C, and notice the max statistic is ~1000ms.

These tests have been rn between machines on a relativley noiseless
ethernet LAN with similar results.

What's also puzzling, is that I see no packet drop reporting in
$ ifconfig lo
lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 16436  metric 1
inet 127.0.0.1  netmask 255.0.0.0
inet6 ::1  prefixlen 128  scopeid 0x10<host>
loop  txqueuelen 0  (Local Loopback)
RX packets 276411482  bytes 15822880567 (14.7 GiB)
RX errors 0  dropped 0  overruns 0  frame 0
TX packets 276411482 bytes 15822880567 (14.7 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions

I'm thinking this may be a bug in the TCP/IP stack, however, I'm not
certain if I'm missing a socket option, or some other configuration
that may elimiate this behavior.

If there's anything else I can help you with, please don't hesitate
to Cc me.

Thanks,
Jesse

Attached: syndrop.pcap

Get the code here
https://github.com/jlyo/packetloss
git clone git://github.com/jlyo/packetloss.git

https://github.com/jlyo/tcping
git clone git://github.com/jlyo/tcping.git

https://github.com/jlyo/closed
git clone git://github.com/jlyo/closed.git

Download attachment "syndrop.pcap" of type "application/vnd.tcpdump.pcap" (622 bytes)