netdev - Re: [PATCH] revert TCP retransmission backoff on ICMP destination unreachable

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-id: <4A926542.3020409@tvk.rwth-aachen.de>
Date:	Mon, 24 Aug 2009 12:02:42 +0200
From:	Damian Lukowski <damian@....rwth-aachen.de>
To:	Ilpo Järvinen <ilpo.jarvinen@...sinki.fi>
Cc:	Netdev <netdev@...r.kernel.org>
Subject: Re: [PATCH] revert TCP retransmission backoff on ICMP destination
 unreachable

> On Wed, 19 Aug 2009, Damian Lukowski wrote:
> 
>> Ok, I will fully quote previous messages now. Just wanted to avoid
>> redundancy.
> 
> Full quoting is not what I meant, Dave explained what I meant in his mail.
> 
>> Anyway, as long as the draft is not accepted as RFC, the
>> patch will violate current standards, so I doubt, Dave would like to
>> apply it soon. :)
> 
> ...Don't be too sure on that ;-).
> 
>> >>>> +        icsk->icsk_backoff--;
>> >>>> +        icsk->icsk_rto >>= 1;
>> >>> Are you absolute sure that we don't go to zero here when phase of the
>> >>> moon is just right? I'd put a max(..., 1) guard there.
>> >> Well, the retransmission timer doubles icsk_rto whenever it increases
>> >> icsk_backoff, so we should reach the original icsk_rto, when
>> >> icsk_backoff becomes zero.
>> > > This is a false assumption. Take a look into the formula, there's
>> a > cap how far icsk_rto goes but icsk_backoff keeps increasing. In
>> fact, > there would probably be a need to set some other lower bound
>> than 1, > TCP_RTO_MIN would probably be more appropriate for this.
>>
>> *sigh* I should have seen that. I would like to preserve the initial rto
>> value and not work with RTO_MIN. First, I have thought about storing the
>> initial rto in another state variable. But what, if we simply
>> recalculate the value?:
>> ###
>>     icsk->icsk_backoff--;
>> -    icsk->icsk_rto >>= 1;
>> +    tcp_set_rto(sk);
>> +    icsk->icsk_rto <<= icsk->icsk_backoff;
>> +    tcp_bound_rto(sk);
>> ###
> 
> Yes, this is sort of a problem. I'd recommend that you read through all
> icsk_rto readers and writers and see if recalculation can be done
> without getting into some problems.

I was going to take your previous max() solution for now, as my proposed solution
has not reflected the current version of the draft. However, I discussed with the
authors of the draft, and they will submit a new version of it, such that a doubling
of an RTO of 100s will not be "reverted" to 60s.

>> >>>> +        BUG_ON(!skb_r);
>> >>>> +
>> >>>> +        if (sock_owned_by_user(sk)) {
>> >>>> +            /* Deferring retransmission clocked by ICMP
>> >>>> +             * due to locked socket. */
>> >>>> +            inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS,
>> >>>> +            min(icsk->icsk_rto, TCP_RESOURCE_PROBE_INTERVAL),
>> >>> ??? (besides having indent wrong). What makes you think HZ/2 is right
>> >>> value if icsk_rto is large?
>> >> To be honest, I have taken over the code from my predecessor and
>> didn't
>> >> question this part.
>> > > But you did Signoff that regardless, so you should know ;-).
>>
>> Only because checkpatch.pl didn't like it otherwise. :)
> 
> ...And you dare to admit :-) ...It certainly has some other meaning
> beyond checkpatch complain, for future, please check and understand what
> it means that you add it there.
> 
>> >>>> +            inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS,
>> >>>> +                          icsk->icsk_rto, TCP_RTO_MAX);
>> >>> RFC 2988, step 5.5 missing? Have you verified this patch for real?
>> >> Thats the whole point of the draft. If consecutive retransmissions
>> >> trigger fitting ICMPs, the RTO value is not doubled, but remains
>> >> constant.
>> > > No, either the draft is wrong or the implementation. 5.5 step
>> should be > applied. Now you end up decreasing RTO per icmp which
>> causes imminent > retransmission instead of keeping it "constant".
>> Effectively, step 4, 5, 6 > (halves like you do), 7 -> R, R is 5.4-5.6
>> and 5.5 increases RTO again > because the timer _expired_ here even
>> though we did short-circuit it > instead of setting a zero timer and
>> using the normal procedure (which in > itself is ok approach, no need
>> to force timer to expire).
>> > >> So you can have a retransmission schedule of
>> >> t, t+r, t+2r, t+3r, t+4r,  t+5r  ...
>> >> instead of
>> >> t, t+r, t+3r, t+7r, t+15r, t+31r ...
>> > > Sure, but that's not what happens, instead you will get this series:
>> > > t, t+r, (t+r)/2, (t+r)/4, ...
>> > > ...disclaimer: it might be off-by-something somewhere but I hope
>> you get > my point anyway.
>>
>> It seems to me, that we focus on different things. From the TCP code's
>> point of view, RFC2988 step 5.5 is of course executed, because timeouts
>> still trigger tcp_retransmit_timer(), which doubles the RTO at the end.
> 
> No and yes. For no: you artificially made tcp_retransmit_timer to not be
> called whenever the ICMP arrives at a point where RTO/2 step causes the
> timer to expire right away. But yes: if you change it to call
> tcp_retransmit_timer as we have discussed, 5.5 should be there (but
> that's not in original patch which I was commenting).
> 
>> But from the network's point of view, there is no doubling, but constant
>> intervals between retransmissions.
> 
> Plain wrong, you would only end up halving as per icmp when the timing
> is right to produce the scenario I'm talking. Please show from the
> original patch
> 
>> The following happens with patched TCP (not considering RTO_MAX):
>> t:        Connectivity breaks, rto is r
>> t+r:        First RTO fires, SND.UNA is retransmitted by
>>         tcp_retransmit_timer() and rto := 2rto
>>         Now it is rto == 2r
>> t+r+epsilon:    ICMP unreach arrives from a router.
>>         The patch sets rto := rto/2
>>         Now it is rto == r
>> t+2r:        Second RTO fires, SND.UNA is retransmitted by
>>         tcp_retransmit_timer() and rto := 2rto
>>         Now it is rto == 2r
>>         Assume, that this retransmission did not trigger
>>         any ICMP message.
>> t+4r:        Third RTO fires, SND.UNA is retransmitted by
>>         tcp_retransmit_timer() and rto := 2rto
>>         Now it is rto == 4r
>>         Assume, that this retransmission did not trigger
>>         any ICMP message.
>> t+8r:        Fourth RTO fires, SND.UNA is retransmitted by
>>         tcp_retransmit_timer() and rto := 2rto
>>         Now it is rto == 8r
>> t+8r+epsilon:    ICMP unreach arrives from a router.
>>         The patch sets rto := rto/2
>>         Now it is rto == 4r
>> t+12r:        Fifth RTO fires ...
> 
> No, that was not the scenario I'm talking about. Let me try to explain:
> 
>> t:        Connectivity breaks, rto is r
>> t+r:        First RTO fires, SND.UNA is retransmitted by
>>         tcp_retransmit_timer() and rto := 2rto
>>         Now it is rto == 2r
> t+2r+epsilon:    ICMP unreach arrives from a router.
>>         The patch sets rto := rto/2
>         t+2r+epsilon > t+r+rto => call for retransmit in the icmp code
>         set rto = r
> t+5/2r+epsilon ICMP unreach arrives from a router.
>         The patch sets rto := rto/2
>         t+5/2r+epsilon > t+something => call for retransmit
>         set rto = r/2
> ...
> 
> See, it's a downward spiral because nothing in the original patch takes
> care of the rto*2 (ie., step 5.5) when RTO is artificially _caused by_
> the ICMP (ie., the halving does make expiry to happen right away, see
> draft step 7).
> 
>> >> I can show you trace plots of patched an unpatched TCP, if you like.
>> >> Should I attach files in the mails, or better post hyperlinks to them?
>> > > Sure, but they must not be from a trivial case but the rto must >
>> "flunctuate"
>> How should they fluctuate? RTO is doubled by tcp_retransmit_timer() no
>> matter what happens, and it is halved by the the patch, if an
>> appropriate ICMP packet has been received.
> 
> The original patch doesn't agree. If you'd have used
> tcp_retransmit_timer in the original I'd agree but you called for
> tcp_retransmit_skb(skb_r) and set the next rto yourself which wouldn't
> let tcp_retransmit_timer to run at all in the scenario I was talking
> about. But lets see the next version of the patch first and only then
> continue the discussion on this particular point about 5.5 (probably not
> then necessary anymore because
> of a change we've discussed).
> 
>> > and the expirys must be using the "imminent" branch for it
>> > to count for me as a proof that you did it right.
>>
>> I'm not sure what you mean.
> 
> The case I'm talking about is the branch where we notice that after
> doing rto/2 the RTO is to be scheduled right away (see step 7 in the
> draft).
> 
>> Still, hyperlinks or attachments?
> 
> Either is fine, but large stuff as links. However, honestly I'm not that
> interest in them as I know that if you have the traces from the
> particular case I'm talking about, they would show the misbehavior
> (other cases work but they do not interest me at all). It would have
> been mostly just to convince you that this misbehavior is there, since
> my words didn't seem to make it :-). I propose we put this "proving"
> aside too for now since the changes we've already discussed should make
> this a non-issue anyway,
> and therefore no need to waste time on that.

Yes, now I see your point. You are right, the old patch would not double anything in your
scenario. And of course there are no plots for that case.  :) 
I will combine your suggested remaining RTO calculation with tcp_retransmit_timer()
instead of retransmit_skb(), which will perform the doubling also in the imminent branch.

I think, I can use all your suggestions for a v2 Patch, which I will post as a new thread.

Thank you.
 Damian

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html