lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <AANLkTimxdEugiN9UM=9WdMJeL+1=2epruYOG9=6SjmO=@mail.gmail.com>
Date:	Thu, 16 Dec 2010 23:20:43 -0800
From:	"George B." <georgeb@...il.com>
To:	netdev <netdev@...r.kernel.org>
Subject: Possible sequence number corruption? 2.6.32-24

I have a box running an Ubuntu kernel (2.6.32-24-server #43-Ubuntu SMP)

It has 4 IP addresses on a bond interface.  We noticed today a strange
problem with a load balancer attempting to connect to one of the
addresses.  We would see a syn from the balancer, the syn-ack goes
back from the Linux box and then an immediate reset from the balancer
(Citrix Netscaler running 8.1).

After taking a packet capture on both machines I see this:

Netscaler sends (or thinks it sends)


Transmission Control Protocol, Src Port: 33860 (33860), Dst Port: 7080
(7080), Seq: 4246152065, Len: 0

Linux box sees this:

Transmission Control Protocol, Src Port: 33860 (33860), Dst Port: 7080
(7080), Seq: 1098970366, Len: 0

Note the sequence number has changed.  This is a flat layer2 network
between them.

Linux replies with:

Transmission Control Protocol, Src Port: 7080 (7080), Dst Port: 33860
(33860), Seq: 1187929616, Ack: 1098970367, Len: 0

And the Netscaler, seeing the out of whack sequence number sends a RST.

Now if I connect to that same IP/Port from another Linux box, it works
fine.  If I connect from the load balancer to a different IP on the
same box it works fine.  If I connect from a different load balancer
to the troublesome IP/port, I get the same result.  It seems unlikely
that two different load balancers would be scrambled in exactly the
same fashion and only to one IP on the box.  In fact, there are two
different servers exhibiting the same problem when either of the load
balancers attempts to connect and it works great to other IP addresses
on those same two servers.  We have not experienced this problem with
these load balancers or kernel version on Linux before but it could be
some "magic" combination of numbers or something, I have seen stranger
things.

The linux interface is a bond interface in balance-xor mode with the
default transmit hash..  The NICs are bnx2 driver v2.0.2

I am not physically at the network so I can not mirror a port and see
what exactly is on the wire but I have taken packet captures using
tcpdump on the machines at both ends that I can provide if desired.
Has anyone else seen or heard of any odd sequence number corruption in
a similar configuration?

Thanks,

George
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ