lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 6 Jun 2008 22:49:59 +0300 (EEST)
From:	"Ilpo Järvinen" <ilpo.jarvinen@...sinki.fi>
To:	Ingo Molnar <mingo@...e.hu>
cc:	Patrick McManus <mcmanus@...ksong.com>,
	David Miller <davem@...emloft.net>, peterz@...radead.org,
	LKML <linux-kernel@...r.kernel.org>,
	Netdev <netdev@...r.kernel.org>, rjw@...k.pl,
	Andrew Morton <akpm@...ux-foundation.org>, johnpol@....mipt.ru
Subject: Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections,
 v2.6.26-rc3+

On Fri, 6 Jun 2008, Ingo Molnar wrote:

> 
> * Ilpo Järvinen <ilpo.jarvinen@...sinki.fi> wrote:
> 
> > If you want an older kernel, you would have to go basically to 2.6.25 
> > or so.
> 
> correct, that's what i use as fallback, some distro kernel which is 
> 2.6.25 or older.
> 
> but i'm confused a bit, you say v2.6.25-rc6-475-gec3c098 introduced the 
> locking problem - so 2.6.25 is affected as well?

No, you're probably just falling into a git-describe trap I also used
to fall:

ijjarvin@...nthope:~/linux/mainline$ git-log -n 1 --pretty=oneline 
ec3c0982a2dd1e671bad8e9d26c28dcba0039d87 ^v2.6.25 | cat -
ec3c0982a2dd1e671bad8e9d26c28dcba0039d87 [TCP]: TCP_DEFER_ACCEPT updates - 
process as established
ijjarvin@...nthope:~/linux/mainline$ git-log -n 1 --pretty=oneline 
ec3c0982a2dd1e671bad8e9d26c28dcba0039d87 ^v2.6.26-rc1 | cat -
ijjarvin@...nthope:~/linux/mainline$ git-describe 
ec3c0982a2dd1e671bad8e9d26c28dcba0039d87
v2.6.25-rc6-475-gec3c098
ijjarvin@...nthope:~/linux/mainline$ 

The git-describe is not the way one can determine into which mainline
tag a commit was included, it basically just provides the closest tag 
among ancestors, which can be a vastly different one and has _no_ 
relation whatsoever to the tag we'd desire to get. In here, Dave had 
net-2.6 based on 2.5.25-rc6ish (or alternatively last merge to net-2.6 
from Linus' tree's content came from that point of time), but Linus did 
the merge from 2.6.25 but git-describe won't look anything that happens 
after the asked commit. This is similar to the 
bisect-lands-lower-tag-than-select-good-commit-was "mystery" that was 
recently discussed extensively, again the Makefile only tracks ancestors, 
not the future.

If somebody knows a trivial command to get that future information (to
where merged info), I'd pretty interested to hear.

> This is a significant 
> question because the fallback kernel is kernel-2.6.25.3-18.fc9.x86_64 on 
> the 16-way box. (all other build-boxes have 2.6.24 or older as a 
> fallback kernel)

Please do get the receiver state if you still see such problem with it, 
it is also relevant but it a different problem then (I'm yet to analyze 
the data Håkan was collecting, dl it already by didn't even look into 
that yet).

...Or also if you see stuck TCPs with other cases I've told should fix it:

1. 2.6.25 (pre-ec3c to be accurate)
2. 3+1 revert
3. ec3c+locking fix (this is the most unsure one because it still would 
have the reversed socket lock taking order though nothing bad has been 
found by some review neither by me nor Patrick)

Please collect at least /proc/net/tcp and the netstat -np, if there's 
process associated to the flow with _Recv-Q_ (in localhost case there 
are two of them, the other with Send-Q), also where the process is
waiting is useful. Hopefully clear enough now... :-)

> > To summarize. Both 3changes+1fix revert (you refer to it only as 
> > 3-patch revert) _and_ the locking fix I made should fix the problem 
> > (obviously they exclude each other). ...And end which is significant 
> > is the one which has LISTENing sockets (please keep this in mind if 
> > you still get the hang and provide some info).
> 
> ok.
> 
> For completeness, let me repeat the patch i referred to as the 
> '3-patch-revert' below. (which indeed is 3+1 as you note)

...I know because there never have been any 3-patch-revert made... :-)

> this is the patch that appears to be working empirically. (Disclaimer: 
> it might just hide the problem, change timings, have a lucky code 
> layout, etc.)

Sure, but the revert also removes the obvious locking problem that was 
introduced in ec3c.


--
 i.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ