[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.64.0805311839540.2657@wrl-59.cs.helsinki.fi>
Date: Sat, 31 May 2008 19:09:52 +0300 (EEST)
From: "Ilpo Järvinen" <ilpo.jarvinen@...sinki.fi>
To: "Håkon Løvdal" <hlovdal@...il.com>
cc: LKML <linux-kernel@...r.kernel.org>,
Netdev <netdev@...r.kernel.org>, Ingo Molnar <mingo@...e.hu>,
"David S. Miller" <davem@...emloft.net>,
"Rafael J. Wysocki" <rjw@...k.pl>,
Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
Thanks for reporting!
On Sat, 31 May 2008, Håkon Løvdal wrote:
> posted a few days ago are somewhat different from mine, however I believe
> this is the same problem or at least related. Just as Ingo experienced,
> netstat -p only shows PID/program as '-' for the hung connections while
> for other connections it shows the expected results.
Hmm, are the other end's processes still there? ...I'd be interested to
know what they're doing at the moment...
> I have recently bought a new PC and have started the process of copying
> stuff from my old PC to the new PC. During this I have experienced this
> hang several times. I started copying by using tar on both ends over a ssh
> pipe but in order to eliminate possible ssh problems I also have tried tar
> over a ttcp connection which also fails. There is no obvious pattern of
> when this happens, I have experienced failures after transferring
> 1.15GB, 51.4GB and 23.6GB.
>
> Here is the output from netstat -n -o filtered for port 22 and slightly
> edited. All the lines started with Proto == tcp and Recv-Q == 0.
...The receiving end's state would be more interesting.
> Send-Q Local Addr Foreign Addr State Timer
> 0 old_pc:22 new_pc:52667 ESTABLISHED keepalive (3513.93/0/0)
> 0 old_pc:22 new_pc:43825 ESTABLISHED keepalive (5467.38/0/0)
> 2896 old_pc:22 new_pc:58601 ESTABLISHED on (21020884.65/0/0)
> 4344 old_pc:22 new_pc:54105 ESTABLISHED on (21017016.33/0/0)
> 2896 old_pc:22 new_pc:34149 ESTABLISHED on (20986889.24/0/0)
>
> The first two connections are ongoing, working, interactive ssh
> connections. The other three connections died days ago on my new PC.
Died? Do you mean that they don't exist all at the other end anymore?
> One thing that caught my eyes was these very high timer values.
> Checking the netstat source reveals that the value printed is "(double)
> time_len / HZ" and that time_len is extracted from /proc/net/tcp. While
> my CONFIG_HZ is 1000, I assume netstat has picked up HZ as 100 from
> /usr/include/asm/param.h, and then things really seems to imply that
> there is some integer overflow since 2^31 = 2147483648.
...plain /proc/net/tcp would be much nicer to read and without all such
conversion troubles ;-).
> Looking into get_tcp4_sock in net/ipv4/tcp_ipv4.c I see that timer_expires
> is initialized with icsk->icsk_timeout for the troublesome cases. But
> here my competence to trace this further stops, so I have no idea of
> how icsk->icsk_timeout gets such high values.
>
> My old PC is currently still running with these stalled connections
> present so let me know if there is something I should try to investigate
> further.
>
> I can post output from /proc/net/tcp
For both ends that would be great.
> and my .config if you want to have a look.
Not needed I think.
> My old PC is 32 bit/Celeron single core, kernel 2.6.24,
> while my new is 64 bit/Q9300 quad core, kernel 2.6.25.3.
> The ethernet cards are the following:
>
> 02:0d.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
> RTL-8139/8139C/8139C+ (rev 10)
> 02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8056
> PCI-E Gigabit Ethernet Controller (rev 12)
--
i.
Powered by blists - more mailing lists