[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20091130213727.2f4047d2@houba>
Date: Mon, 30 Nov 2009 21:37:27 +0100
From: Frederic Leroy <fredo@...rox.org>
To: "Ilpo Järvinen" <ilpo.jarvinen@...sinki.fi>
Cc: Netdev <netdev@...r.kernel.org>, Asdo <asdo@...ftmail.org>
Subject: Re: scp stalls mysteriously
Le Mon, 30 Nov 2009 22:18:22 +0200 (EET),
"Ilpo Järvinen" <ilpo.jarvinen@...sinki.fi> a écrit :
> On Mon, 30 Nov 2009, Frederic Leroy wrote:
>
> Thanks, at least ....3.... shows that cwnd (congestion window, the
> 2nd to last field in /proc/net/tcp) goes to zero which it never ever
> should. ...I'll start reading some code to figure out the cause for
> that.
>
> How much is your RTT btw?
The RTT is ~ 1.4ms.
The following trace has a rtt as big as 5ms but most of the time, it's
under 1.4ms and periodically grow to 3ms.
-------------------------------------------
64 bytes from sg (192.168.1.19): icmp_seq=681 ttl=64 time=1.26 ms
64 bytes from sg (192.168.1.19): icmp_seq=682 ttl=64 time=1.26 ms
64 bytes from sg (192.168.1.19): icmp_seq=683 ttl=64 time=1.34 ms
64 bytes from sg (192.168.1.19): icmp_seq=684 ttl=64 time=2.01 ms
64 bytes from sg (192.168.1.19): icmp_seq=686 ttl=64 time=2.15 ms
64 bytes from sg (192.168.1.19): icmp_seq=687 ttl=64 time=5.07 ms
64 bytes from sg (192.168.1.19): icmp_seq=688 ttl=64 time=1.19 ms
64 bytes from sg (192.168.1.19): icmp_seq=689 ttl=64 time=1.30 ms
64 bytes from sg (192.168.1.19): icmp_seq=690 ttl=64 time=1.19 ms
64 bytes from sg (192.168.1.19): icmp_seq=691 ttl=64 time=1.14 ms
-------------------------------------------
> The RTO timeout easily goes up to 120
> secods similar to what I found myself already. Thus I'm not sure if
> this stall would be "infinite" after all.
I notice that grep '":0016" /proc/net/tcp' stop to report something when
it stalls. Is the connection closed ?
> I've some problem in .3. case in matching timing of the proc/net/tcp
> dump and the tcpdump ... did you perhaps cut it somewhere?
No, I didn't cut it.
> This might
> point out to yet another problem, however, I'm not sure what I can
> atm trust in the logs. ...Timestamps would probably help me a bit
> (I've changed also sleep to do 0.01 granularity):
>
> while [ : ]; do date "+%s.%N"; grep ":0016" /proc/net/tcp; sleep
> 0.01; done | tee scp_stall-houba.x.proc_net_tcp
> (Please run start while loop before starting the transfer and all the
> way until the connection is already killed and only then stop the
> loop).
I took care to timings this time. I did as you said.
> What happened in .2. case I've very little idea, proc/net/tcp does
> claim retransmission would have happened on TCP layer but that never
> went into tcpdump log (the kind of thing might have happened in
> the .3. case too which is why I asked for the timestamps for matching
> captures more exactly but there were some retransmissions earlier in
> the log so I cannot be sure). Can you also check that tcpdump doesn't
> lose any packets, it should show: "0 packets dropped by kernel" when
> you ctrl-c it.
I still have the logs, O packet dropped :
-------------------------------------------
houba tmp # tcpdump -i wlan1 -s 1500 -w /tmp/ssh_stall-houba.1.pcap -vv
ether host 00:40:45:09:3f:3f tcpdump: listening on wlan1, link-type
EN10MB (Ethernet), capture size 1500 bytes ^C8923 packets captured
8923 packets received by filter
0 packets dropped by kernel
houba tmp # tcpdump -i wlan1 -s 1500 -w /tmp/scp_stall-houba.3.pcap
-vv ether host 00:40:45:09:3f:3f tcpdump: listening on wlan1, link-type
EN10MB (Ethernet), capture size 1500 bytes ^C17045 packets captured
17045 packets received by filter
0 packets dropped by kernel
houba tmp #
-------------------------------------------
--
Frédéric Leroy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists