lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20130919221850.77620129@samsung-9>
Date:	Thu, 19 Sep 2013 22:18:50 -0700
From:	Stephen Hemminger <stephen@...workplumber.org>
To:	netdev@...r.kernel.org
Subject: Fw: [Bug 61681] New: Incoming TCP4 connections fail to start, don't
 get past SYN_RECV and then quickly disappear



Begin forwarded message:

Date: Thu, 19 Sep 2013 09:42:15 -0700
From: "bugzilla-daemon@...zilla.kernel.org" <bugzilla-daemon@...zilla.kernel.org>
To: "stephen@...workplumber.org" <stephen@...workplumber.org>
Subject: [Bug 61681] New: Incoming TCP4 connections fail to start, don't get past SYN_RECV and then quickly disappear


https://bugzilla.kernel.org/show_bug.cgi?id=61681

            Bug ID: 61681
           Summary: Incoming TCP4 connections fail to start, don't get
                    past SYN_RECV and then quickly disappear
           Product: Networking
           Version: 2.5
    Kernel Version: Linux xxxxxx 3.4.57-48.42.amzn1.x86_64 #1 SMP Mon Aug
                    12 21:43:36 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
          Hardware: IA-64
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: IPV4
          Assignee: shemminger@...ux-foundation.org
          Reporter: dcrooke@...il.com
        Regression: No

This bug appears to be very rare, but entirely real, and it dates back a long
time. I tried to debug it thoroughly looking at both kernel and webserver
settings, and then got down to looking at netstat.

The Linux kernel can sometimes get into a state where it fails to complete
approx 98% of incoming TCP connection attempts, and only correctly processes
about 2%. These numbers may be relevant as others have posted finding the same
"1 in 50" ratio on much older kernels over the years.

I did not get a chance to capture traffic with iptables / pcap / Wireshark
(production box so we gave up quickly and tried a reboot) but other folks with
the same issue indicate that Linux is sending the wrong remote sequence number
back in the SYN-ACK packet, and the client simply drops it. My experience is
that the half formed connection is torn down almost immediately - I was running
netstat in a continuous loop to see this, others have observed that their
clients send RST in response to the malformed SYN-ACK.

http://serverfault.com/questions/297134/server-not-sending-a-syn-ack-packet-in-response-to-a-syn-packet

http://ask.wireshark.org/questions/23885/rst-after-syn-ack

For us, the problem went away on a reboot and so far has stayed away, so I am
wondering if it is a factor of cumulative traffic but TCP sequence number
wraparound on the Linux end shouldn't cause this afaict, it should be simply
replying to the client with the sequence number that came in the SYN packet.

A number of people have had very similar looking issues due to broken
multi-path network config or a broken NAT device. Obviously this is not the
case here, Amazon knows how to do IT, this box only has one interface, and in
any case the Linux kernel is still responsible for the sequence number it
replies with.

-- 
You are receiving this mail because:
You are the assignee for the bug.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ