netdev - Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Fri, 6 Jun 2008 21:25:42 +0300 (EEST)
From:	"Ilpo Järvinen" <ilpo.jarvinen@...sinki.fi>
To:	Patrick McManus <mcmanus@...ksong.com>
cc:	Ingo Molnar <mingo@...e.hu>, David Miller <davem@...emloft.net>,
	peterz@...radead.org, LKML <linux-kernel@...r.kernel.org>,
	Netdev <netdev@...r.kernel.org>, rjw@...k.pl,
	Andrew Morton <akpm@...ux-foundation.org>, johnpol@....mipt.ru
Subject: Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections,
 v2.6.26-rc3+

On Fri, 6 Jun 2008, Patrick McManus wrote:

> > This Ingo's testcase should anyway be quite "simple", I mean that distcc 
> > shouldn't do anything unexpected in a sense it shouldn't abort the flows 
> > by not sending data, close the listening socket or other things like that.
> 
> maybe - I've noted that I can get the distcc server to crash with just a
> little fuzz (telnet to it and close the telnet) - but it is true I
> haven't seen anything odd using the distcc client.

In addition I think I've also seen some bits floating around that 
occassionally distcc does something weird in a correct setup too.

I briefly looked how distcc behaved while doing the stress_accept. Distcc 
basically seems to have n processes each accept()ing and some kind of 
memleak killer by limiting number of successive accepts then exit, while 
the parent who did the listen is only periodically (had some sleep(1)s) 
collecting dead ones & respawning them.

> Anyhow, my news is that using rc5 I have managed to reproduce it on
> localhost - so it isn't just ingo anymore ! ;)

Also Peter Z has reported it earlier, it was distcc+localhost for him as 
well.

> and has intentionally broken dependencies so it just keeps recompiling 
> stuff.

...Trying to invent perpetual motion machine? :-/

> The input files are
> approximately 135k, 98k, and 16k after running gcc -E on them (which I
> what I assume distcc does before putting them down the socket).
>
> On rc5 I could get the lockup in under 20 minutes.. usually 10. I think
> I did it 4 times. My compile test is probably a better trigger than the
> kernel compile because the distcc connects are never staggered like they
> would be in a large directory of files. (3 files, -j4).

It could be even easier if you make next in path gcc to play with 
nice, trying a number of different values might reveal some really fast 
to reproduce scenario.

> When I apply the locking patch you (Ilpo) wrote, I cannot reproduce the
> error at all in the first 90 minutes of testing. I'll let the test run
> and update the list.

At least it helps some :-), like it should.

> I'm holding out hope that Ingo's report did not have the locking patch
> on the distcc server end - because it certainly makes a difference for 
> me.

...He had some issue with different versions being deployed at least in 
the past, and I failed to follow his latest answer :-).

-- 
 i.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html