lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 4 Jun 2008 00:46:55 +0300 (EEST)
From:	"Ilpo Järvinen" <ilpo.jarvinen@...sinki.fi>
To:	Ingo Molnar <mingo@...e.hu>
cc:	Peter Zijlstra <peterz@...radead.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Netdev <netdev@...r.kernel.org>,
	"David S. Miller" <davem@...emloft.net>,
	"Rafael J. Wysocki" <rjw@...k.pl>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Evgeniy Polyakov <johnpol@....mipt.ru>,
	Patrick McManus <mcmanus@...ksong.com>
Subject: Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections,
 v2.6.26-rc3+

On Tue, 3 Jun 2008, Ingo Molnar wrote:

> * Ingo Molnar <mingo@...e.hu> wrote:
> 
> > > ...setsockopt(listenfd, SOL_TCP, TCP_DEFER_ACCEPT, &val, 
> > > sizeof(val)) seems to be the magic trick that is interestion here.
> > 
> > seems to be used:
> > 
> >  22003 write(3, "distccd[22003] (dcc_listen_by_ad"..., 62) = 62
> >  22003 listen(4, 10)                     = 0
> >  22003 setsockopt(4, SOL_TCP, TCP_DEFER_ACCEPT, [1], 4) = 0
> > 
> > i'll queue up your reverts for testing in -tip.
> 
> update: your 3 reverts in tip/out-of-tree [commit dad98991c] definitely 
> fixed the hangs!

...It wasn't exactly out-of-tree, Evgeniy fixed a problem that was found
in "TCP_DEFER_ACCEPT updates - process as established", perhaps it just 
wasn't in your testing tree yet.

$ git-log -n 1 --pretty=full 9ae27e0adbf471c7a6b80102e38e1d5a346b3b38 | 
grep "Commit:"
Commit: David S. Miller <davem@...emloft.net>

> Here is the testing i did:
> 
> first i ran about 500+ successful iterations on the affected testboxes 
> with your revert patch applied, on multiple systems.

Are you sure this is enough to conclude the results? Seems quite small 
number to me to rule out luck. Especially considering that it was some 
amount of time in the tree already until you noticed it for the first 
time.

Anyway, nice that it seems to be helping. It was almost the only 
possibility on TCP side, I don't think there were any other state machine 
related changes. So it wasn't just "random revert" in that sense like you 
were implying :-), I just didn't have any theory how it would cause the 
problem... ...I even first disregarded DA that because of timeline 
in-exactness and because I wrongly assumed that distcc probably won't use 
it anyway, but then I checked later on and found out that it was present 
at least in the source I had lying around.

Anyway, it might be that the revert was a bit overkill, I'm not fully sure 
if 539fae and e4c7884 need to be reverted to fix it since main changes 
are in ec3c098. I just didn't want to take chances at first and put them 
all to the revert list.

> Then today, without 
> changing anything else on one of the testsystems i reverted your revert 
> on that single system. After about an hour of testing, in 20 iterations 
> i got a hang again over localhost:
> 
>  titan:~> netstat -nt
>  Active Internet connections (w/o servers)
>  Proto Recv-Q Send-Q Local Address               Foreign Address             State
>  tcp        0 174592 10.0.1.14:34710             10.0.1.14:3632              ESTABLISHED
>  tcp    72145      0 10.0.1.14:3632              10.0.1.14:34710             ESTABLISHED
> 
> so i hereby conclude that your revert works :) I've repeated the commit 
> below that resolves this nasty regression.

...I couldn't immediately find anything obviously wrong with those changes 
but the patch below might be worth of a try (without the revert of 
course). If it ever spits out that WARN_ON for you, we were playing with 
fire too much and it's better to return on the safe side there...

-- 
 i.

[PATCH] tcp DEFER_ACCEPT: see if header prediction got turned on

If header prediction is turned on under some circumstances,
DA can deadlock though I have great trouble in figuring out
how it could ever happen while ending up into that else
branch (but I've been wrong before as well :-)).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@...sinki.fi>
---
 net/ipv4/tcp_input.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index c9454f0..0d9a3fe 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4595,6 +4595,9 @@ static int tcp_defer_accept_check(struct sock *sk)
 			   tp->defer_tcp_accept.listen_sk->sk_state != TCP_LISTEN) {
 			tcp_reset(sk);
 			return -1;
+		} else {
+			WARN_ON(tp->pred_flags);
+			tp->pred_flags = 0;
 		}
 	}
 	return 0;
-- 
1.5.2.2

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ