netdev - Re: scp stalls mysteriously

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.00.0912032207430.776@melkinpaasi.cs.helsinki.fi>
Date:	Thu, 3 Dec 2009 22:36:53 +0200 (EET)
From:	"Ilpo Järvinen" <ilpo.jarvinen@...sinki.fi>
To:	Frederic Leroy <fredo@...rox.org>
cc:	Damian Lukowski <damian@....rwth-aachen.de>,
	Netdev <netdev@...r.kernel.org>,
	David Miller <davem@...emloft.net>,
	Eric Dumazet <eric.dumazet@...il.com>,
	Herbert Xu <herbert@...dor.apana.org.au>,
	Greg KH <gregkh@...e.de>
Subject: Re: scp stalls mysteriously

On Thu, 3 Dec 2009, Frederic Leroy wrote:

> Le Thu, 03 Dec 2009 15:10:11 +0100,
> Damian Lukowski <damian@....rwth-aachen.de> a écrit :
> > > I suppose adding || !tp->retrans_stamp into the false condition is
> > > fine as long as we don't then have a connection that can cause a
> > > connection to hang there forever for some reason (this needs to be
> > > understood well enough, not just test driven in stables :-)).
> > > 
> > >> Unluckily, I still cannot reproduce the scp stalls here, so it
> > >> would be nice if Frederic printed retrans_stamp together with
> > >> icsk_ca_state and icsk_retransmits, please.
> > > 
> > > It wouldn't hurt to know tp->packets_out and tp->retrans_out too,
> > > that might have some significant w.r.t what happens because of FRTO.
> > 
> > I made a patch for Frederic with Codebase
> > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
> > 
> > Thanks for testing.
> 
> I made a new .11 trace with damian patch.
> The copy went to the end. 
> 
> http://www.starox.org/pub/scp_stall/scp_stall-houba.11-patch_damian.pcap.bz2
> http://www.starox.org/pub/scp_stall/scp_stall-houba.11-patch_damian.dmesg.bz2
> http://www.starox.org/pub/scp_stall/scp_stall-houba.11-patch_damian.proc_net_netstat-before.bz2
> http://www.starox.org/pub/scp_stall/scp_stall-houba.11-patch_damian.proc_net_netstat-after.bz2
> http://www.starox.org/pub/scp_stall/scp_stall-houba.11-patch_damian.sysctl_net.ipv4.tcp-before.bz2

So the || !tp->retrans_stamp certainly solves the silent death, however, 
annoying thing is that now we didn't get get the zero events logged :-).
Can you please print them also in that branch. Also, I forgot previously 
that tp->frto_counter would be nice to know. And since there seems to be a 
need for yet another case, in order to solve the other problem related to 
fast retransmit, also tp->lost_out and tp->sacked_out would be nice to
know.

MIBs tell:
...24 SACK recoveries started but never any retransmission in them...
...DSACK undo counter is probably miscounting, 28 is way too much to be 
realistic for those two DSACKs received :-). It might relate to this not 
retransmitting anything.

-- 
 i.