[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-id: <4A926542.3020409@tvk.rwth-aachen.de>
Date: Mon, 24 Aug 2009 12:02:42 +0200
From: Damian Lukowski <damian@....rwth-aachen.de>
To: Ilpo Järvinen <ilpo.jarvinen@...sinki.fi>
Cc: Netdev <netdev@...r.kernel.org>
Subject: Re: [PATCH] revert TCP retransmission backoff on ICMP destination
unreachable
> On Wed, 19 Aug 2009, Damian Lukowski wrote:
>
>> Ok, I will fully quote previous messages now. Just wanted to avoid
>> redundancy.
>
> Full quoting is not what I meant, Dave explained what I meant in his mail.
>
>> Anyway, as long as the draft is not accepted as RFC, the
>> patch will violate current standards, so I doubt, Dave would like to
>> apply it soon. :)
>
> ...Don't be too sure on that ;-).
>
>> >>>> + icsk->icsk_backoff--;
>> >>>> + icsk->icsk_rto >>= 1;
>> >>> Are you absolute sure that we don't go to zero here when phase of the
>> >>> moon is just right? I'd put a max(..., 1) guard there.
>> >> Well, the retransmission timer doubles icsk_rto whenever it increases
>> >> icsk_backoff, so we should reach the original icsk_rto, when
>> >> icsk_backoff becomes zero.
>> > > This is a false assumption. Take a look into the formula, there's
>> a > cap how far icsk_rto goes but icsk_backoff keeps increasing. In
>> fact, > there would probably be a need to set some other lower bound
>> than 1, > TCP_RTO_MIN would probably be more appropriate for this.
>>
>> *sigh* I should have seen that. I would like to preserve the initial rto
>> value and not work with RTO_MIN. First, I have thought about storing the
>> initial rto in another state variable. But what, if we simply
>> recalculate the value?:
>> ###
>> icsk->icsk_backoff--;
>> - icsk->icsk_rto >>= 1;
>> + tcp_set_rto(sk);
>> + icsk->icsk_rto <<= icsk->icsk_backoff;
>> + tcp_bound_rto(sk);
>> ###
>
> Yes, this is sort of a problem. I'd recommend that you read through all
> icsk_rto readers and writers and see if recalculation can be done
> without getting into some problems.
I was going to take your previous max() solution for now, as my proposed solution
has not reflected the current version of the draft. However, I discussed with the
authors of the draft, and they will submit a new version of it, such that a doubling
of an RTO of 100s will not be "reverted" to 60s.
>> >>>> + BUG_ON(!skb_r);
>> >>>> +
>> >>>> + if (sock_owned_by_user(sk)) {
>> >>>> + /* Deferring retransmission clocked by ICMP
>> >>>> + * due to locked socket. */
>> >>>> + inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS,
>> >>>> + min(icsk->icsk_rto, TCP_RESOURCE_PROBE_INTERVAL),
>> >>> ??? (besides having indent wrong). What makes you think HZ/2 is right
>> >>> value if icsk_rto is large?
>> >> To be honest, I have taken over the code from my predecessor and
>> didn't
>> >> question this part.
>> > > But you did Signoff that regardless, so you should know ;-).
>>
>> Only because checkpatch.pl didn't like it otherwise. :)
>
> ...And you dare to admit :-) ...It certainly has some other meaning
> beyond checkpatch complain, for future, please check and understand what
> it means that you add it there.
>
>> >>>> + inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS,
>> >>>> + icsk->icsk_rto, TCP_RTO_MAX);
>> >>> RFC 2988, step 5.5 missing? Have you verified this patch for real?
>> >> Thats the whole point of the draft. If consecutive retransmissions
>> >> trigger fitting ICMPs, the RTO value is not doubled, but remains
>> >> constant.
>> > > No, either the draft is wrong or the implementation. 5.5 step
>> should be > applied. Now you end up decreasing RTO per icmp which
>> causes imminent > retransmission instead of keeping it "constant".
>> Effectively, step 4, 5, 6 > (halves like you do), 7 -> R, R is 5.4-5.6
>> and 5.5 increases RTO again > because the timer _expired_ here even
>> though we did short-circuit it > instead of setting a zero timer and
>> using the normal procedure (which in > itself is ok approach, no need
>> to force timer to expire).
>> > >> So you can have a retransmission schedule of
>> >> t, t+r, t+2r, t+3r, t+4r, t+5r ...
>> >> instead of
>> >> t, t+r, t+3r, t+7r, t+15r, t+31r ...
>> > > Sure, but that's not what happens, instead you will get this series:
>> > > t, t+r, (t+r)/2, (t+r)/4, ...
>> > > ...disclaimer: it might be off-by-something somewhere but I hope
>> you get > my point anyway.
>>
>> It seems to me, that we focus on different things. From the TCP code's
>> point of view, RFC2988 step 5.5 is of course executed, because timeouts
>> still trigger tcp_retransmit_timer(), which doubles the RTO at the end.
>
> No and yes. For no: you artificially made tcp_retransmit_timer to not be
> called whenever the ICMP arrives at a point where RTO/2 step causes the
> timer to expire right away. But yes: if you change it to call
> tcp_retransmit_timer as we have discussed, 5.5 should be there (but
> that's not in original patch which I was commenting).
>
>> But from the network's point of view, there is no doubling, but constant
>> intervals between retransmissions.
>
> Plain wrong, you would only end up halving as per icmp when the timing
> is right to produce the scenario I'm talking. Please show from the
> original patch
>
>> The following happens with patched TCP (not considering RTO_MAX):
>> t: Connectivity breaks, rto is r
>> t+r: First RTO fires, SND.UNA is retransmitted by
>> tcp_retransmit_timer() and rto := 2rto
>> Now it is rto == 2r
>> t+r+epsilon: ICMP unreach arrives from a router.
>> The patch sets rto := rto/2
>> Now it is rto == r
>> t+2r: Second RTO fires, SND.UNA is retransmitted by
>> tcp_retransmit_timer() and rto := 2rto
>> Now it is rto == 2r
>> Assume, that this retransmission did not trigger
>> any ICMP message.
>> t+4r: Third RTO fires, SND.UNA is retransmitted by
>> tcp_retransmit_timer() and rto := 2rto
>> Now it is rto == 4r
>> Assume, that this retransmission did not trigger
>> any ICMP message.
>> t+8r: Fourth RTO fires, SND.UNA is retransmitted by
>> tcp_retransmit_timer() and rto := 2rto
>> Now it is rto == 8r
>> t+8r+epsilon: ICMP unreach arrives from a router.
>> The patch sets rto := rto/2
>> Now it is rto == 4r
>> t+12r: Fifth RTO fires ...
>
> No, that was not the scenario I'm talking about. Let me try to explain:
>
>> t: Connectivity breaks, rto is r
>> t+r: First RTO fires, SND.UNA is retransmitted by
>> tcp_retransmit_timer() and rto := 2rto
>> Now it is rto == 2r
> t+2r+epsilon: ICMP unreach arrives from a router.
>> The patch sets rto := rto/2
> t+2r+epsilon > t+r+rto => call for retransmit in the icmp code
> set rto = r
> t+5/2r+epsilon ICMP unreach arrives from a router.
> The patch sets rto := rto/2
> t+5/2r+epsilon > t+something => call for retransmit
> set rto = r/2
> ...
>
> See, it's a downward spiral because nothing in the original patch takes
> care of the rto*2 (ie., step 5.5) when RTO is artificially _caused by_
> the ICMP (ie., the halving does make expiry to happen right away, see
> draft step 7).
>
>> >> I can show you trace plots of patched an unpatched TCP, if you like.
>> >> Should I attach files in the mails, or better post hyperlinks to them?
>> > > Sure, but they must not be from a trivial case but the rto must >
>> "flunctuate"
>> How should they fluctuate? RTO is doubled by tcp_retransmit_timer() no
>> matter what happens, and it is halved by the the patch, if an
>> appropriate ICMP packet has been received.
>
> The original patch doesn't agree. If you'd have used
> tcp_retransmit_timer in the original I'd agree but you called for
> tcp_retransmit_skb(skb_r) and set the next rto yourself which wouldn't
> let tcp_retransmit_timer to run at all in the scenario I was talking
> about. But lets see the next version of the patch first and only then
> continue the discussion on this particular point about 5.5 (probably not
> then necessary anymore because
> of a change we've discussed).
>
>> > and the expirys must be using the "imminent" branch for it
>> > to count for me as a proof that you did it right.
>>
>> I'm not sure what you mean.
>
> The case I'm talking about is the branch where we notice that after
> doing rto/2 the RTO is to be scheduled right away (see step 7 in the
> draft).
>
>> Still, hyperlinks or attachments?
>
> Either is fine, but large stuff as links. However, honestly I'm not that
> interest in them as I know that if you have the traces from the
> particular case I'm talking about, they would show the misbehavior
> (other cases work but they do not interest me at all). It would have
> been mostly just to convince you that this misbehavior is there, since
> my words didn't seem to make it :-). I propose we put this "proving"
> aside too for now since the changes we've already discussed should make
> this a non-issue anyway,
> and therefore no need to waste time on that.
Yes, now I see your point. You are right, the old patch would not double anything in your
scenario. And of course there are no plots for that case. :)
I will combine your suggested remaining RTO calculation with tcp_retransmit_timer()
instead of retransmit_skb(), which will perform the doubling also in the imminent branch.
I think, I can use all your suggestions for a v2 Patch, which I will post as a new thread.
Thank you.
Damian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists