[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.00.0912032207430.776@melkinpaasi.cs.helsinki.fi>
Date: Thu, 3 Dec 2009 22:36:53 +0200 (EET)
From: "Ilpo Järvinen" <ilpo.jarvinen@...sinki.fi>
To: Frederic Leroy <fredo@...rox.org>
cc: Damian Lukowski <damian@....rwth-aachen.de>,
Netdev <netdev@...r.kernel.org>,
David Miller <davem@...emloft.net>,
Eric Dumazet <eric.dumazet@...il.com>,
Herbert Xu <herbert@...dor.apana.org.au>,
Greg KH <gregkh@...e.de>
Subject: Re: scp stalls mysteriously
On Thu, 3 Dec 2009, Frederic Leroy wrote:
> Le Thu, 03 Dec 2009 15:10:11 +0100,
> Damian Lukowski <damian@....rwth-aachen.de> a écrit :
> > > I suppose adding || !tp->retrans_stamp into the false condition is
> > > fine as long as we don't then have a connection that can cause a
> > > connection to hang there forever for some reason (this needs to be
> > > understood well enough, not just test driven in stables :-)).
> > >
> > >> Unluckily, I still cannot reproduce the scp stalls here, so it
> > >> would be nice if Frederic printed retrans_stamp together with
> > >> icsk_ca_state and icsk_retransmits, please.
> > >
> > > It wouldn't hurt to know tp->packets_out and tp->retrans_out too,
> > > that might have some significant w.r.t what happens because of FRTO.
> >
> > I made a patch for Frederic with Codebase
> > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
> >
> > Thanks for testing.
>
> I made a new .11 trace with damian patch.
> The copy went to the end.
>
> http://www.starox.org/pub/scp_stall/scp_stall-houba.11-patch_damian.pcap.bz2
> http://www.starox.org/pub/scp_stall/scp_stall-houba.11-patch_damian.dmesg.bz2
> http://www.starox.org/pub/scp_stall/scp_stall-houba.11-patch_damian.proc_net_netstat-before.bz2
> http://www.starox.org/pub/scp_stall/scp_stall-houba.11-patch_damian.proc_net_netstat-after.bz2
> http://www.starox.org/pub/scp_stall/scp_stall-houba.11-patch_damian.sysctl_net.ipv4.tcp-before.bz2
So the || !tp->retrans_stamp certainly solves the silent death, however,
annoying thing is that now we didn't get get the zero events logged :-).
Can you please print them also in that branch. Also, I forgot previously
that tp->frto_counter would be nice to know. And since there seems to be a
need for yet another case, in order to solve the other problem related to
fast retransmit, also tp->lost_out and tp->sacked_out would be nice to
know.
MIBs tell:
...24 SACK recoveries started but never any retransmission in them...
...DSACK undo counter is probably miscounting, 28 is way too much to be
realistic for those two DSACKs received :-). It might relate to this not
retransmitting anything.
--
i.
Powered by blists - more mailing lists