netdev - Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Pine.LNX.4.64.0805311839540.2657@wrl-59.cs.helsinki.fi>
Date:	Sat, 31 May 2008 19:09:52 +0300 (EEST)
From:	"Ilpo Järvinen" <ilpo.jarvinen@...sinki.fi>
To:	"Håkon Løvdal" <hlovdal@...il.com>
cc:	LKML <linux-kernel@...r.kernel.org>,
	Netdev <netdev@...r.kernel.org>, Ingo Molnar <mingo@...e.hu>,
	"David S. Miller" <davem@...emloft.net>,
	"Rafael J. Wysocki" <rjw@...k.pl>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+

Thanks for reporting!

On Sat, 31 May 2008, Håkon Løvdal wrote:

> posted a few days ago are somewhat different from mine, however I believe
> this is the same problem or at least related. Just as Ingo experienced,
> netstat -p only shows PID/program as '-' for the hung connections while
> for other connections it shows the expected results.

Hmm, are the other end's processes still there? ...I'd be interested to 
know what they're doing at the moment...

> I have recently bought a new PC and have started the process of copying
> stuff from my old PC to the new PC. During this I have experienced this
> hang several times. I started copying by using tar on both ends over a ssh
> pipe but in order to eliminate possible ssh problems I also have tried tar
> over a ttcp connection which also fails. There is no obvious pattern of
> when this happens, I have experienced failures after transferring 
> 1.15GB, 51.4GB and 23.6GB.
>
> Here is the output from netstat -n -o filtered for port 22 and slightly
> edited. All the lines started with Proto == tcp and Recv-Q == 0.

...The receiving end's state would be more interesting.

> Send-Q Local Addr Foreign Addr  State       Timer
>      0 old_pc:22  new_pc:52667  ESTABLISHED keepalive (3513.93/0/0)
>      0 old_pc:22  new_pc:43825  ESTABLISHED keepalive (5467.38/0/0)
>   2896 old_pc:22  new_pc:58601  ESTABLISHED on (21020884.65/0/0)
>   4344 old_pc:22  new_pc:54105  ESTABLISHED on (21017016.33/0/0)
>   2896 old_pc:22  new_pc:34149  ESTABLISHED on (20986889.24/0/0)
> 
> The first two connections are ongoing, working, interactive ssh
> connections. The other three connections died days ago on my new PC.

Died? Do you mean that they don't exist all at the other end anymore?

> One thing that caught my eyes was these very high timer values.
> Checking the netstat source reveals that the value printed is "(double)
> time_len / HZ" and that time_len is extracted from /proc/net/tcp. While
> my CONFIG_HZ is 1000, I assume netstat has picked up HZ as 100 from
> /usr/include/asm/param.h, and then things really seems to imply that
> there is some integer overflow since 2^31 = 2147483648.

...plain /proc/net/tcp would be much nicer to read and without all such 
conversion troubles ;-).

> Looking into get_tcp4_sock in net/ipv4/tcp_ipv4.c I see that timer_expires
> is initialized with icsk->icsk_timeout for the troublesome cases. But
> here my competence to trace this further stops, so I have no idea of
> how icsk->icsk_timeout gets such high values.
>
> My old PC is currently still running with these stalled connections
> present so let me know if there is something I should try to investigate
> further.
>
> I can post output from /proc/net/tcp

For both ends that would be great.

> and my .config if you want to have a look.

Not needed I think.

> My old PC is 32 bit/Celeron single core, kernel 2.6.24,
> while my new is 64 bit/Q9300 quad core, kernel 2.6.25.3.
> The ethernet cards are the following:
> 
> 02:0d.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
> RTL-8139/8139C/8139C+ (rev 10)
> 02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8056
> PCI-E Gigabit Ethernet Controller (rev 12)


-- 
 i.