netdev - Possible sequence number corruption? 2.6.32-24

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <AANLkTimxdEugiN9UM=9WdMJeL+1=2epruYOG9=6SjmO=@mail.gmail.com>
Date:	Thu, 16 Dec 2010 23:20:43 -0800
From:	"George B." <georgeb@...il.com>
To:	netdev <netdev@...r.kernel.org>
Subject: Possible sequence number corruption? 2.6.32-24

I have a box running an Ubuntu kernel (2.6.32-24-server #43-Ubuntu SMP)

It has 4 IP addresses on a bond interface.  We noticed today a strange
problem with a load balancer attempting to connect to one of the
addresses.  We would see a syn from the balancer, the syn-ack goes
back from the Linux box and then an immediate reset from the balancer
(Citrix Netscaler running 8.1).

After taking a packet capture on both machines I see this:

Netscaler sends (or thinks it sends)

Transmission Control Protocol, Src Port: 33860 (33860), Dst Port: 7080
(7080), Seq: 4246152065, Len: 0

Linux box sees this:

Transmission Control Protocol, Src Port: 33860 (33860), Dst Port: 7080
(7080), Seq: 1098970366, Len: 0

Note the sequence number has changed.  This is a flat layer2 network
between them.

Linux replies with:

Transmission Control Protocol, Src Port: 7080 (7080), Dst Port: 33860
(33860), Seq: 1187929616, Ack: 1098970367, Len: 0

And the Netscaler, seeing the out of whack sequence number sends a RST.

Now if I connect to that same IP/Port from another Linux box, it works
fine.  If I connect from the load balancer to a different IP on the
same box it works fine.  If I connect from a different load balancer
to the troublesome IP/port, I get the same result.  It seems unlikely
that two different load balancers would be scrambled in exactly the
same fashion and only to one IP on the box.  In fact, there are two
different servers exhibiting the same problem when either of the load
balancers attempts to connect and it works great to other IP addresses
on those same two servers.  We have not experienced this problem with
these load balancers or kernel version on Linux before but it could be
some "magic" combination of numbers or something, I have seen stranger
things.

The linux interface is a bond interface in balance-xor mode with the
default transmit hash..  The NICs are bnx2 driver v2.0.2

I am not physically at the network so I can not mirror a port and see
what exactly is on the wire but I have taken packet captures using
tcpdump on the machines at both ends that I can provide if desired.
Has anyone else seen or heard of any odd sequence number corruption in
a similar configuration?

Thanks,

George
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html