[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.64.0806161542290.16829@wrl-59.cs.helsinki.fi>
Date: Mon, 16 Jun 2008 16:21:25 +0300 (EEST)
From: "Ilpo Järvinen" <ilpo.jarvinen@...sinki.fi>
To: Didier Raboud <didier@...oud.com>
cc: Andrew Morton <akpm@...ux-foundation.org>,
Netdev <netdev@...r.kernel.org>, bugme-daemon@...zilla.kernel.org
Subject: Re: [Bugme-new] [Bug 10903] New: ssh connections hang with 2.6.26-rc5
On Sun, 15 Jun 2008, Didier Raboud wrote:
> Le samedi 14 juin 2008 22:45:41 Ilpo Järvinen, vous avez écrit :
> > On Fri, 13 Jun 2008, Andrew Morton wrote:
> > > (switched to email. Please respond via emailed reply-to-all, not via the
> > > bugzilla web interface).
>
> OK.
>
> > > On Fri, 13 Jun 2008 02:39:17 -0700 (PDT) bugme-daemon@...zilla.kernel.org
> wrote:
> > > > http://bugzilla.kernel.org/show_bug.cgi?id=10903
> > > >
> > > > Summary: ssh connections hang with 2.6.26-rc5
> > > > Product: Networking
> > > > Version: 2.5
> > > > KernelVersion: 2.6.26-rc5
> > > > Platform: All
> > > > OS/Version: Linux
> > > > Tree: Mainline
> > > > Status: NEW
> > > > Severity: normal
> > > > Priority: P1
> > > > Component: Other
> > > > AssignedTo: acme@...stprotocols.net
> > > > ReportedBy: didier@...oud.com
> > > >
> > > >
> > > > Latest working kernel version: 2.6.25-2
> > > > Earliest failing kernel version: 2.6.26-rc5
> > > > Distribution: Debian (Lenny + Sid)
> > > > Hardware Environment: amd64 (Dell Latitude D630)
> > > > Software Environment: KDE
> > > > Problem Description:
> > > >
> > > > With kernel version 2.6.26-rc5, the ssh connections to remote servers
> > > > randomly
> > > > hang (no error message). No amelioration despite the activation of
> > > > "ServerAliveInterval" on both sides.
> >
> > Thanks for reporting. Could you please clarify couple of things:
>
> Hi.
>
> I will try to, with my time and knowledge.
>
> > Does this only happen with a particular server/servers?
>
> I have only tried with two of my home servers. One runs 2.6.22-4-686 and the
> other 2.6.18-6-vserver-686.
>
> > Any middleboxes in between (NAT, firewall, etc.)?
>
> There is a ADSL router which "provides" internet to the servers by NAT.
> I have tried from "inside" the house (so in the same subnet) and from
> outside: it hangs in both cases.
...Ok. Those have some timeouts for idle sessions, so if one has active or
keepalived sessions that shouldn't be a problem.
> The common point is my use of "iwl3945" : I have always tried the ssh
> connections through WiFi.
>
> > Do all ssh connections hang simultaneously?
>
> Well... It is hard to say. As far as I have seen, no. When I get one hang, I
> can successfully connect to the same server.
It's quite likely that the hangs are independent, but it was worth of
confirming still.
> > How long have you waited until concluding that TCP is "hung"?
>
> Well. The "ServerAliveInterval" option of openssh now leads to "Received
> disconnect from $IP: 2: Timeout, your session not responding." after the
> hang. So the openssh server notices that my session is not responding and so
> cuts the connection.
>
> > Is TSO enabled (ethtool -k)? Have you tried without it?
>
> Doesn't seem:
>
> ----
> # ethtool -k wlan0
> Offload parameters for wlan0:
> Cannot get device rx csum settings: Operation not supported
> Cannot get device tx csum settings: Operation not supported
> Cannot get device scatter-gather settings: Operation not supported
> Cannot get device tcp segmentation offload settings: Operation not supported
> Cannot get device udp large send offload settings: Operation not supported
> Cannot get device generic segmentation offload settings: Operation not
> supported
> no offload info available
> ----
>
> > It wouldn't hurt to include info about eth hw too (e.g., lspci), though
> > it might turn unneeded at some point of time but it might save an email
> > round-trip.
>
> lspci attached.
...Thanks for all the details. I especially appreciated the kernel
versions of the servers since TCP has two end hosts (and I forgot to
ask)... :-)
> > TCP can appear to hang due to vast number of reasons. Only recent changes
> > that are suspectable is the DEFERRED_ACCEPT thing which is already
> > reverted in the very latest Linus' tree (even -rc6 is too old for that)
> > and few FRTO fixes (you can exclude FRTO by turning
> > /proc/sys/net/ipv4/tcp_frto sysctl to 0 but it seems quite unlikely to
> > change anything); your problem might well come from something else and TCP
> > hang is just a symptom of other problem downstream.
>
> I can't understand everything, but what I can say is that with the exact
> same software, I get no hangs with 2.6.25-2 but I get some with
> 2.6.26-rc5.
Yes, I understand that... I was just trying to bring up above what has
changed between those kernels :-). ...Quite few TCP related changes
actually (there were also some TSO related changes but they're not
significant in your case).
> > So please gather this information (at least for the relevant connections):
> >
> > $ netstat -pn
> > $ cat /proc/net/tcp
>
> Attached.
I probably wasn't specific enough. ...I meant that you would get this once
one of the ssh sessions gets stuck (do it right after you notice that the
session is stuck, that should get the info before the connection is cut
down). This info should be collected from both ends (on client and
server).
> > ...Also a tcpdump might be handy (though I don't know yet).
>
> Well. It seems that there is another bug here: everytime I tried a
Ah, lets try to figure that one out as well...
> # tcpdump -w /tmp/tcpdump.wlan0 -i wlan0
>
> I got a CPU lockup (or similar, can't know exactly, but keyboard blocked and
> nothing doable).
You probably run it under X, no? Please switch beforehand to some other vt
(a textual one) then (Ctrl-Alt-Fn, where n < 6) and then log in and
running that command there and see if you get some output into screen
there. If you see something (e.g., a sudden OOPS message or some other
warning printed) when it locks up, the easiest things is to take a shot
with a digicam (or write it down somewhere else) and send that shot (or
those details) to us please.
...Once you have a tcpdump, I can probably figure at least something out
(though it might still just point to the right direction rather than
exposing the actual cause).
--
i.
Powered by blists - more mailing lists