netdev - Re: scp stalls mysteriously

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Sun, 06 Dec 2009 23:44:21 +0100
From:	Damian Lukowski <damian@....rwth-aachen.de>
To:	Ilpo Järvinen <ilpo.jarvinen@...sinki.fi>
Cc:	Frederic Leroy <fredo@...rox.org>, Netdev <netdev@...r.kernel.org>,
	David Miller <davem@...emloft.net>,
	Eric Dumazet <eric.dumazet@...il.com>,
	Herbert Xu <herbert@...dor.apana.org.au>,
	Greg KH <gregkh@...e.de>
Subject: Re: scp stalls mysteriously

Ilpo Järvinen schrieb:
> On Sat, 5 Dec 2009, Damian Lukowski wrote:
> 
>> Frederic Leroy schrieb:
>>> Le Thu, 03 Dec 2009 15:10:11 +0100,
>>> Damian Lukowski <damian@....rwth-aachen.de> a écrit :
>>>>> I suppose adding || !tp->retrans_stamp into the false condition is
>>>>> fine as long as we don't then have a connection that can cause a
>>>>> connection to hang there forever for some reason (this needs to be
>>>>> understood well enough, not just test driven in stables :-)).
>>>>>
>>>>>> Unluckily, I still cannot reproduce the scp stalls here, so it
>>>>>> would be nice if Frederic printed retrans_stamp together with
>>>>>> icsk_ca_state and icsk_retransmits, please.
>>>>> It wouldn't hurt to know tp->packets_out and tp->retrans_out too,
>>>>> that might have some significant w.r.t what happens because of FRTO.
>>>> I made a patch for Frederic with Codebase
>>>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
>>>>
>>>> Thanks for testing.
>>> I made a new .11 trace with damian patch.
>>> The copy went to the end.
>> Could you please make another test and unplug the cable or drop ACKs for
>> several seconds, so that some RTO retransmissions are performed?
>> I'd like to see if retrans_stamp remains constant. In the dmesg output of
>> the 11th run, it seems to change while icsk_retransmits also increases.
>> This is kind of bad for connection timeout calculation in the RTO case ...
> 
> 
> ...Good point, I think that's another bug in this area. We should prevent 
> retrans_stamp update when the RTO itself caused tp->retrans_out to become
> zero. This bug affects also other things that are based on retrans_stamp, 
> not only your code but there the effect is just less devastating.
> 
> ...Anyway, we know already what happens by reading code when we know the 
> case what to look for. If tcp_retransmit_skb encounters any error no 
> retrans_stamp is changed (besides, cable might not be enough if you don't 
> lose the route, depends on configuration what happens). And if retrans_out 
> is zero, the stamp gets updated (assuming the rexmit was successfully 
> made without some error condition).
> 
> What I think we need is something like this to handle all error cases 
> cleanly (plus to fix that another bug I mentioned above):
> 
> 	if (!icsk->icsk_retransmits)
> 		return false;
> 
> 	if (unlikely(!tp->retrans_stamp)) {
> 		start_ts = TCP_SKB_CB(tcp_write_queue_head(sk))->when;
> 	} else {
> 		start_ts = tp->retrans_stamp;
> 	}
> 
> ...and then use that timestamp in the substraction. It handles the case 
> where retrans_stamp was never updated (agreeably there are corner cases 
> even then where the connection doesn't die exactly when we would want when 
> retrans_stamp got set on some high icsk_retransmits but still it would 
> only be less than double of the specified time). ...We just need to audit 
> that tcp_write_queue_head and when are always valid when call happens (I 
> think they should be but that has to checked).

retransmits_timed_out() is called directly by tcp_retransmit_timer() and by
tcp_write_timeout() which itself is only called by tcp_retransmit_timer().
As tcp_retransmit_timer() calls tcp_write_queue_head() at two points,
I think it is safe to assume, that it will be valid in
retransmits_timed_out() as well.
Also I have checked, that any call of tcp_transmit_skb() is preceded by
an update of cb->when. So it should be set and valid in
retransmits_timed_out(). Do you agree?

Anyway, how to proceed now? Should I make a patch and post it? If yes, on
which mailing list, which codebase? Or maybe it should be posted in a
series of patches, which also fixes the ENOMEM issue (which I do not really
have a clue about)?

Damian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html