netdev - Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Pine.LNX.4.64.0809221112140.1950@wrl-59.cs.helsinki.fi>
Date:	Mon, 22 Sep 2008 14:22:12 +0300 (EEST)
From:	"Ilpo Järvinen" <ilpo.jarvinen@...sinki.fi>
To:	"Dâniel Fraga" <fragabr@...il.com>
cc:	David Miller <davem@...emloft.net>, thomas.jarosch@...ra2net.com,
	billfink@...dspring.com, Netdev <netdev@...r.kernel.org>,
	Patrick Hardy <kaber@...sh.net>,
	netfilter-devel@...r.kernel.org, kadlec@...ckhole.kfki.hu
Subject: Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround

On Mon, 22 Sep 2008, Dâniel Fraga wrote:

> On Fri, 19 Sep 2008 00:04:23 +0300 (EEST)
> "Ilpo Järvinen" <ilpo.jarvinen@...sinki.fi> wrote:
> 
> > Anyway, if/when you succeed collecting some strace of the server 
> > processes, please let me know (though putting a full one available might 
> > not be wise thing like I said earlier). After I thought it a bit, it might 
> > be enough the start the strace with -p for all server processes of a 
> > service during a stall and then resolve it after some amount of waiting 
> > with nmap (and hope that strace doesn't resolve it by interfering 
> > something relevant :-), you will see that from the fact that it resolves 
> > without nmap then). That would probably reveal if the processes where 
> > waiting in accept() or not, and if not, where they were.
> 
> 	Hi again Ilpo, I waited the whole day for a stall, and
> fortunatelly it happened while I was stracing dovecot and child
> processes. The stall happened at 01:11 (at the end). I hope that it
> has something useful.

It definately shows a stall, there are _no_ events between 0:53 and 1:11 
while there isn't any other period like that, every other minute since the 
start has some activity going on :-). So this might not be related to 
networking at all like we've kind of already figured out (definately 
accept() has very little to do here). There weren't close()'es there 
either so it looks very stuck on something that's outside of the syscalls 
we listed in -e, I suppose...

It seems that next sensible step is to just obtain a full strace to see 
what actually took place during those long minutes if anything (it's 
better that you keep that log private and just use grep over it on 
request). ...A full strace might grow huge though. Also, for strace use 
-tt instead of -t to get more accurate timestamps and add -T.

When you get the stall next time, please also check that the processes are 
actually sleeping instead of looping like crazy in some buggy userspace 
code :-) (obviously before resolving it with nmap).

When using nmap to resolve, take note on exact timestamp (including 
seconds). E.g., 
$ date > nmap.ts; nmap ...

-- 
 i.