lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.64.0807181617550.24938@wrl-59.cs.helsinki.fi>
Date:	Fri, 18 Jul 2008 16:55:22 +0300 (EEST)
From:	"Ilpo Järvinen" <ilpo.jarvinen@...sinki.fi>
To:	Thomas Jarosch <thomas.jarosch@...ra2net.com>
cc:	Jozsef Kadlecsik <kadlec@...ckhole.kfki.hu>,
	Netdev <netdev@...r.kernel.org>,
	Patrick McHardy <kaber@...sh.net>,
	Sven Riedel <sr@...urenet.de>,
	Netfilter Developer Mailing List 
	<netfilter-devel@...r.kernel.org>,
	"Dâniel Fraga" <fragabr@...il.com>,
	David Miller <davem@...emloft.net>
Subject: Re: TCP connection stalls under 2.6.24.7

On Fri, 18 Jul 2008, Thomas Jarosch wrote:

> On Thursday, 17. July 2008 17:53:01 Ilpo Järvinen wrote:
> > > > One option would be to disable reentry to FRTO when some progress was
> > > > made... Please try with the patch below...
> >
> > Ah, I just forgot that the situation might persist... Try with this
> > one instead...
> 
> Good news everyone: Two connections made it to the finish line.
> 
> The bad part: One transfer took four minutes, the other sixteen minutes.
> A colleague commented it's still much faster than carrying the message
> by plane ;-) A session without FRTO takes around 84 seconds.

...I guess if you would limit ssthresh to some small value you might beat 
that value even without FRTO.

> I've added debug printks() to every return path in tcp_use_frto(),
> so you can see what's going on. They look like this:
> 
> Jul 18 10:20:40 intratest131 kernel: [  957.318006] tcp_use_frto: ENTER: frto_counter: 0, icsk->icsk_ca_state: 0
> Jul 18 10:20:40 intratest131 kernel: [  957.318011] tcp_use_frto: DEFAULT RETURN 1;
> Jul 18 10:21:08 intratest131 kernel: [  984.446006] tcp_use_frto: ENTER: frto_counter: 3, icsk->icsk_ca_state: 0
> Jul 18 10:21:08 intratest131 kernel: [  984.446011] tcp_use_frto: RETURN in "tp->frto_counter > 1 || icsk->icsk_ca_state == TCP_CA_Loss"
> Jul 18 10:21:14 intratest131 kernel: [  991.058006] tcp_use_frto: ENTER: frto_counter: 0, icsk->icsk_ca_state: 0
> Jul 18 10:21:14 intratest131 kernel: [  991.058011] tcp_use_frto: DEFAULT RETURN 1;
> 
> Here are two new dumps and the corresponding debug traces:
> http://www.intra2net.com/de/download/tcpdump/tcp_frto_second_patch.tar.bz2

It seems that with FRTO the retransmission timeout grows much higher which 
causes longer delays when things continue by RTO, this might be plainly 
due to the fact that some timeouts seem indeed spurious, and with FRTO we 
can take RTT measures out of such. I'll keep digging deeper... The 
receiver is definately doing something crazy as well, eg.:

6.1.131.56060: . ack 1995587 win 65535
152.31.131.25: . 1998387:1999787(1400) ack 562 win 7504 (DF)
152.31.131.25: . 1999787:2001187(1400) ack 562 win 7504 (DF)
152.31.131.25: . 2001187:2002587(1400) ack 562 win 7504 (DF)
6.1.131.56060: . ack 1995587 win 8192 (DF)
6.1.131.56060: . ack 1996987 win 8192 (DF)
6.1.131.56060: . ack 1996987 win 8192 (DF)
6.1.131.56060: . ack 1996987 win 8192 (DF)

...The receiver shrunk the window here (it's not the only example) :-), 
though on the bright side, those are duplicate ACKs... :-D

Btw, on which kernel you ran these things (I hope it wasn't 2.6.24.7, 
which has FRTO related bugs anyway that the patches I've sent now won't 
fix)? 

-- 
 i.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ