lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Mon, 30 Nov 2009 22:18:22 +0200 (EET) From: "Ilpo Järvinen" <ilpo.jarvinen@...sinki.fi> To: Frederic Leroy <fredo@...rox.org> cc: Netdev <netdev@...r.kernel.org>, Asdo <asdo@...ftmail.org> Subject: Re: scp stalls mysteriously On Mon, 30 Nov 2009, Frederic Leroy wrote: > Le Sat, 28 Nov 2009 13:31:14 +0200 (EET), > "Ilpo Järvinen" <ilpo.jarvinen@...sinki.fi> a écrit : > > > On Sat, 28 Nov 2009, Frederic Leroy wrote: > > > > > Le Sat, 28 Nov 2009 00:12:23 +0200 (EET), > > > "Ilpo Järvinen" <ilpo.jarvinen@...sinki.fi> a écrit : > > > > > > > On Fri, 27 Nov 2009, Frederic Leroy wrote: > > > > > > > > > I put traces of stall here : > > > > > http://www.starox.org/pub/scp_stall/ > > [...] > > Perhaps having the /proc/net/tcp would at least tell what state the > > timer is (if I cannot reproduce right away). ...It is rather strange > > that two independent mechanisms for loss recovery seem both to fail > > to get triggered here, no traces of retransmission whatsoever. I > > think it is for now enough to concentrate on what happens on > > 192.168.1.15 (=houba?) and get tcpdump and proc/net/tcp from there, > > the other end/direction has very little significance here (except for > > the fact that bidirectionality might be needed to actually trigger > > it). You could even think of getting proc/net/tcp a bit more often, > > right from the start: > > > > while [ : ]; do grep ":0016" /proc/net/tcp; sleep 0.1; done | tee > > scp_stall-houba.x.proc_net_tcp > > > > ...Please wait at least 2 minutes before hitting ctrl-c or otherwise > > artificially intervening. > > You're right, 192.168.1.15 is houba. > The faulty behaviour comes back today, I did new captures on > 192.168.1.15 only : > http://www.starox.org/pub/scp_stall/scp_stall-houba.2.pcap > http://www.starox.org/pub/scp_stall/scp_stall-houba.2.proc_net_tcp > http://www.starox.org/pub/scp_stall/scp_stall-houba.3.pcap > http://www.starox.org/pub/scp_stall/scp_stall-houba.3.proc_net_tcp > > Which tool do you use to analyze /proc/net/tcp ? Thanks, at least ....3.... shows that cwnd (congestion window, the 2nd to last field in /proc/net/tcp) goes to zero which it never ever should. ...I'll start reading some code to figure out the cause for that. How much is your RTT btw? The RTO timeout easily goes up to 120 secods similar to what I found myself already. Thus I'm not sure if this stall would be "infinite" after all. I've some problem in .3. case in matching timing of the proc/net/tcp dump and the tcpdump ... did you perhaps cut it somewhere? This might point out to yet another problem, however, I'm not sure what I can atm trust in the logs. ...Timestamps would probably help me a bit (I've changed also sleep to do 0.01 granularity): while [ : ]; do date "+%s.%N"; grep ":0016" /proc/net/tcp; sleep 0.01; done | tee scp_stall-houba.x.proc_net_tcp (Please run start while loop before starting the transfer and all the way until the connection is already killed and only then stop the loop). What happened in .2. case I've very little idea, proc/net/tcp does claim retransmission would have happened on TCP layer but that never went into tcpdump log (the kind of thing might have happened in the .3. case too which is why I asked for the timestamps for matching captures more exactly but there were some retransmissions earlier in the log so I cannot be sure). Can you also check that tcpdump doesn't lose any packets, it should show: "0 packets dropped by kernel" when you ctrl-c it. -- i.
Powered by blists - more mailing lists