lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 22 Jul 2008 23:12:56 +0200
From:	Willy Tarreau <w@....eu>
To:	Ingo Molnar <mingo@...e.hu>
Cc:	David Newall <davidn@...idnewall.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	David Miller <davem@...emloft.net>, akpm@...ux-foundation.org,
	netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
	Stefan Richter <stefanr@...6.in-berlin.de>
Subject: Re: [TCP bug] stuck distcc connections in latest -git

On Tue, Jul 22, 2008 at 05:34:43PM +0200, Ingo Molnar wrote:
> * David Newall <davidn@...idnewall.com> wrote:
> 
> > Ingo Molnar wrote:
> > > * David Newall <davidn@...idnewall.com> wrote:
> > >   
> > >> You really should start that capture, and on both client and server. 
> > >> You don't need to dump everything, only traffic to or from 
> > >> server:distcc.
> > >>     
> > >
> > > It's not feasible. That box did in excess of 200 GB of network traffic 
> > > in the past 7 hours alone.
> > 
> > You only need distcc traffic, and perhaps only after it's hung.  With 
> > 250k outstanding per socket, are you certain that no traffic was sent? 
> > Is it certain that one packet wasn't being sent each three minutes?  I 
> > suppose you're right and the stack really is stuck, but this is such 
> > an easy thing to check and eliminate that you should do so.  I 
> > suppose, too, that you should trace the server-side processes and 
> > confirm that they are waiting for socket input.  You should dump tcp 
> > (for the distcc port) next time the problem recurs and also check that 
> > the server processes are waiting for socket input.
> 
> ok, will do that if it happens again.

Ingo,

if it can help, I have a "capture" script which allows you to define
a size and will rotate captures within that size. That's what I'm
using to troubleshoot rarely occuring problems in datacenters, so
it's horrible but efficient :-)

You just have to stop it once the problem has happened again. Ping
me if you're interested (I'm lazy to start my laptop right just for
it now in fact).

Willy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ