lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 26 May 2008 20:08:37 +0300 (EEST)
From:	"Ilpo Järvinen" <ilpo.jarvinen@...sinki.fi>
To:	Ingo Molnar <mingo@...e.hu>
cc:	LKML <linux-kernel@...r.kernel.org>,
	Netdev <netdev@...r.kernel.org>,
	"David S. Miller" <davem@...emloft.net>,
	"Rafael J. Wysocki" <rjw@...k.pl>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+

On Mon, 26 May 2008, Ingo Molnar wrote:

> 
> * Ilpo Järvinen <ilpo.jarvinen@...sinki.fi> wrote:
> 
> > > > Hmm, readfds is NULL isn't it?!? Are you sure you straced the 
> > > > right process?
> > > 
> > > yes, i'm stracing the task that is hung unexpectedly.
> > 
> > But that wasn't the receiving process? (I didn't quickly find into 
> > which direction distcc ports go, so I couldn't confirm this). If you 
> > still have that situation at hand, could you check which is the 
> > receiving process (e.g., using netstat -p, the end which has Recv-Q is 
> > the right one) and where it's stuck?
> 
> it wasnt the receiving process. There's no receiving process - which is 
> weird:
> 
> Active Internet connections (w/o servers)
> Proto Recv-Q Send-Q Local Address               Foreign Address             State       PID/Program name   
> tcp        0 207232 europe:37198                europe:distcc               ESTABLISHED 19578/distcc        
> tcp        0      0 europe:ssh                  dione:36284                 ESTABLISHED -                   
> tcp        0      0 europe:ssh                  e2:45910                    ESTABLISHED -                   
> tcp    72283      0 europe:distcc               europe:37198                ESTABLISHED -                   

Just to be sure (please forgive me if you find this nearly an insult :-)), 
did you have enough rights to find out the pid (ie., if that process not 
owned by you then you need superuser privs for that)?

> i just gave it as a general example of why sometimes stracing a task can 
> 'disturb' the observed system and can kick the TCP state machine out of 
> a stall. I did not say it's occuring here.

Yeah, I understood that earlier. Similarly, I just wanted to point out
the end where the problem lies :-).

> > ...It may still be that the receiving process is stuck due to the 
> > non-net related changes you have there.
> 
> the socket does not seem to be owned. It should have closed down? 
> Refcounting issue?

It's well possible that e.g., net namespaces have some bug in handling
of orphaned tcp.

> find below the sysrq-t dump.

...I'll have a look into that as well (though with such I'm on a more 
unfamiliar territory, so it will take a moment).


-- 
 i.

Powered by blists - more mailing lists