netdev - Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Pine.LNX.4.64.0805312332270.2760@wrl-59.cs.helsinki.fi>
Date:	Sun, 1 Jun 2008 00:39:05 +0300 (EEST)
From:	"Ilpo Järvinen" <ilpo.jarvinen@...sinki.fi>
To:	"Håkon Løvdal" <hlovdal@...il.com>
cc:	LKML <linux-kernel@...r.kernel.org>,
	Netdev <netdev@...r.kernel.org>, Ingo Molnar <mingo@...e.hu>,
	"David S. Miller" <davem@...emloft.net>,
	"Rafael J. Wysocki" <rjw@...k.pl>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+

On Sat, 31 May 2008, Håkon Løvdal wrote:

> 2008/5/31 Ilpo Järvinen <ilpo.jarvinen@...sinki.fi>:
>
> > So you had that '-' earlier and you checked at that time but the
> > connection is now already dead?
>
> This is only from checking after the connection was dead.

Could you please rephrase the answer, I failed to understand it... :-)
...You said earlier that you had '-' owned connections like Ingo, when did 
that happen (now the connections won't exists anymore, so at what point of 
time you saw those non-owned connections)?

> By the way,
> I just had to remotely reboot the new machine because the window
> manager locked up, however the old PC are still listing the defunct
> connections after this.

Ok.

> > :-(, I would some much liked to see what they were doing.
>
> I can of course keep on copying for testing purposes, but then I would 
> like to be able to dump only that single tcp connection, any tips of how 
> to do that?
> I found nothing specific in the manuals of wireshark and tcpdump. Of 
> cours it is possible to capture everything and filter afterwards, but 
> since I will be transferring lots of data the logs will get huge and I 
> would not like to have even additional traffic inside...

I didn't really mean tcpdump, I was more thinking of syscall what is the 
syscall where the process is waiting. Though tcpdump might reveal 
something as well about the behavior when nearing the problem,

tcpdump -n -i <iface> host <blahblah> and port <portno> and ...

Host & port as written above matches for either src and dst, I don't 
remember how one could specify just one of them but it's not usually 
necessary (won't be here either).

> > These 7C/D... certainly seem strange values. Which TCP variant you 
> > have in use (cat /proc/sys/net/ipv4/tcp_congestion_control)? It seems 
> > that vegas, veno and yeah at least contain 0x7fffffff there for some 
> > rtt, which could perhaps somehow leak.
> 
> I have not done any specific selection myself. On old_pc: bic, new_pc: 
> cubic.

Ok, after some searching it also seems that it was a dead-end anyway:
  - icsk_retransmit_timer is only set to icsk->icsk_timeout or
    jiffies + (HZ / 20)
  - icsk_timeout is only set after if (when > max_when) limiting (in 
    unsigned quantities)
  - max_when is always given TCP_RTO_MAX by TCP... 

...I'm currently out of ideas with this one then, I think I checked all 
types too and nothing came up :-(.

Hmm, perhaps periodically checking /proc/net/tcp (e.g., once per 10s) if 
the timeout is larger than TCP_RTO_MAX might allow some script to 
immediately notice when things broke while reproducing it. Storing all 
those once per 10s values shouldn't be a too big either, it could even be 
done in both ends for a single flow (but I'll leave a script to do that on 
Monday).

-- 
 i.