[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <30638.1198647786@turing-police.cc.vt.edu>
Date: Wed, 26 Dec 2007 00:43:06 -0500
From: Valdis.Kletnieks@...edu
To: Andrew Morton <akpm@...ux-foundation.org>,
Paul Moore <paul.moore@...com>
Cc: linux-kernel@...r.kernel.org
Subject: 2.6.24-rc6-mm1 - git-lblnet.patch and networking horkage
On Sat, 22 Dec 2007 23:30:56 PST, Andrew Morton said:
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc6/2.6.24-rc6-mm1/
I've bisected it down this far:
kvm-ist-kaput.patch GOOD
git-lblnet.patch
git-lblnet-fixup.patch
git-leds.patch
git-libata-all.patch
git-libata-all-fix-pata_winbond-borkage.patch
git-libata-all-wtf.patch BAD
and somehow, I doubt the leds or libata trees horked up networking. ;)
Symptoms - semi-sporadic failures in making network connections. The test
case that tripped it up was the 'make test' from the Tcl 8.5 - several of the
test cases will create a listening socket, and then try to connect to it.
Under 2.6.24-rc5-mm1, it works just fine, but I'm seeing hangs under -rc6-mm1.
Doing a 'netstat -n -a -A inet -p' while it's hung shows me this:
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:34118 0.0.0.0:* LISTEN 2236/tcltest
tcp 0 1 127.0.0.1:59460 127.0.0.1:34118 SYN_SENT 2236/tcltest
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:47842 0.0.0.0:* LISTEN 2352/tcltest
tcp 0 1 127.0.0.1:46510 127.0.0.1:47842 SYN_SENT 2352/tcltest
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:47842 0.0.0.0:* LISTEN 2352/tcltest
tcp 0 1 127.0.0.1:46510 127.0.0.1:47842 SYN_SENT 2352/tcltest
Pretty consistent failure mode - a socket is in 'listen', and the connection
gets hung in 'SYN_SENT'. There's 3 outputs listed - the first one from one run
of the test case, the second 2 are some 20 seconds apart on the same run.
It's pretty obvious that if you can't complete a 3-packet handshake to loopback
in 20 seconds, something is hosed. However, it's apparently some sort of
race/timing issue, as many *other* test cases in the Tcl test tree do in fact
work OK.
I already checked, it's not a slam-dunk to just 'patch -R' as there's 3 or 4
conflicts where later patches need massaging/reverting as well.
It's a problem with both 'classic RCU' and 'preempt RCU' (that was my *first*
guess as to the cause).
Any clues/hints/advice/patches?
Content of type "application/pgp-signature" skipped
Powered by blists - more mailing lists