lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 26 May 2008 19:32:14 +0300 (EEST)
From:	"Ilpo Järvinen" <ilpo.jarvinen@...sinki.fi>
To:	Ingo Molnar <mingo@...e.hu>
cc:	LKML <linux-kernel@...r.kernel.org>,
	Netdev <netdev@...r.kernel.org>,
	"David S. Miller" <davem@...emloft.net>,
	"Rafael J. Wysocki" <rjw@...k.pl>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+

On Mon, 26 May 2008, Ingo Molnar wrote:

> 
> * Ilpo Järvinen <ilpo.jarvinen@...sinki.fi> wrote:
> 
> > On Mon, 26 May 2008, Ingo Molnar wrote:
> > 
> > > there's a hung distcc task on the system, waiting for socket action 
> > > forever:
> > > 
> > >   [root@...ope ~]# strace -fp 19578
> > >   Process 19578 attached - interrupt to quit
> > >   select(5, NULL, [4], [4], {82, 90000} <unfinished ...>
> > 
> > Hmm, readfds is NULL isn't it?!? Are you sure you straced the right 
> > process?
> 
> yes, i'm stracing the task that is hung unexpectedly.

But that wasn't the receiving process? (I didn't quickly find into which 
direction distcc ports go, so I couldn't confirm this). If you still have 
that situation at hand, could you check which is the receiving process 
(e.g., using netstat -p, the end which has Recv-Q is the right one) and 
where it's stuck?

> > > disturbing that task via strace did not change the state of the 
> > > socket - and that's not unexpected as it's a select(). [TCP state 
> > > might be affected if strace impacted a recvmsg or a sendmsg wait 
> > > directly.]
> > 
> > I fail to understand this paragraph due to excessive negation... :-)
> 
> i mean, sometimes a TCP connection can get 'unstuck' if you strace a 
> task - that is because the TCP related syscall the task sits in gets 
> interrupted. But in this case it's select() which doesnt explicitly take 
> the socket, doesnt do any tcp_push_pending_frames() processing, etc. - 
> it just its on the socket waitqueue AFAICS. And that's expected.

This is not in the sender end at all. It's correct behavior of the flow 
control to stop the sender until more room is made available by the 
reading end. Thus push_pending_frames couldn't send anything.

...It may still be that the receiving process is stuck due to the non-net 
related changes you have there.

-- 
 i.

Powered by blists - more mailing lists