netdev - Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Pine.LNX.4.64.0808291045430.7971@wrl-59.cs.helsinki.fi>
Date:	Fri, 29 Aug 2008 16:07:04 +0300 (EEST)
From:	"Ilpo Järvinen" <ilpo.jarvinen@...sinki.fi>
To:	"Dâniel Fraga" <fragabr@...il.com>
cc:	David Miller <davem@...emloft.net>, thomas.jarosch@...ra2net.com,
	billfink@...dspring.com, Netdev <netdev@...r.kernel.org>,
	Patrick Hardy <kaber@...sh.net>,
	netfilter-devel@...r.kernel.org, kadlec@...ckhole.kfki.hu
Subject: Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround

On Thu, 28 Aug 2008, Dâniel Fraga wrote:

> Well, I hope this time you have more information and I hope I 
> didn't forget anything. If not, let's keep trying.

Thanks. It took a moment for me to analyze such sheer amount of data, 
but I'm used to large logs... :-)

Can you check during a "normal" time if the ListenOverflows grows with as 
considerable rate as during the stall (no need to send that log to me,
just confirm that it doesn't do that is enough). A little cheat to do that 
for a logfile (the command I used):

grep -A1 "ListenOverflows" <log> | cut -d ' ' -f 21-22 | grep [0-9]

> Important: these data were collected with frto disabled (0) and htb 
> disabled too. So it isn't related to frto, neither htb.

I kind of assumed/knew that since the htb patch didn't solve it.

...When you use nmap to resolve, is the time always constant or do you run 
it until the situation resolves?

There are constantly 9 items in sk_ack_backlog (ie., connections which are 
not yet accept), those connections are in TCP_CLOSE_WAIT, then there are 
~7 connections hanging in SYN_RECV which cannot make progress (all of them 
from a single address besides two flows of yours in SYN_RECV).

So I guess that the configured 128 is not related to the number that 
is given to listen syscall, as it seems to be 9.

...Next we need to find out why dovecot is not accept()ing or is doing 
that dead slow (the client's state is hardly significant, so I guess 
it's no longer mandatory to collect it every time)...

Can you provide these to familiarize myself a bit to the server's 
environment (no need to wait for the stall):

ps ax | grep dovecot  (or whatever the process is named)
netstat -p -n -l | grep "995"

But you'll mostly have to resort to strace during the stall, I recommend 
trying to trace just part of the syscalls, eg at least these:

strace -e trace=accept,listen,close,shutdown,select

...as it would probably not be wise to make a full dump available (that it 
would contain every syscall). Alternatively, you can create one full dump 
for yourself and just grep the relevant parts. There may be need to strace
more than one process (all dovecot related).

-- 
 i.