[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5640c7e00708291510p778f387w51d50e981ba49a25@mail.gmail.com>
Date: Thu, 30 Aug 2007 10:10:37 +1200
From: "Ian McDonald" <ian.mcdonald@...di.co.nz>
To: "David Miller" <davem@...emloft.net>
Cc: rick.jones2@...com, netdev@...r.kernel.org
Subject: Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
On 8/30/07, David Miller <davem@...emloft.net> wrote:
> From: "Ian McDonald" <ian.mcdonald@...di.co.nz>
> Date: Thu, 30 Aug 2007 09:32:38 +1200
>
> > So I'm suspecting that the default should be changed to 1000 to match
> > the RFC which would solve this issue. I note that the RFC is a SHOULD
> > rather than a MUST. I had a quick look around and not sure why Linux
> > overrides the RFC on this one.
>
> Everyone uses this value, even BSD since ancient times.
>
> None of the research folks want to commit to saying a lower value is
> OK, even though it's quite clear that on a local 10 gigabit link a
> minimum value of even 200 is absolutely and positively absurd.
>
Understand what you are saying. That is why I questioned as 200 msecs
makes no sense on a LAN with < 1 msec RTT. So if the current is
ridiculous and 1000 is even more so, why do we use? Just because that
is how TCP is written I'm guessing.
I know that in DCCP CCID3 the RTO is 4 x RTT (from memory - it might
be a slight variation) but we ended up putting a minimum on it as you
also face a problem if it fires too frequently (i.e. link is in
usecs).
I might ask around on research lists and see why this issue has never
been revisited.
Now to the original issue - high RTT links. If that is an issue, and I
believe it would be, then it's probably better to do this on a per
route basis or similar, although then we're becoming a defacto X x rtt
type setup. Rereading the RFC this actually doesn't seem prohibited
and here is the code from DCCP CCID3 that we use:
/*
* Update timeout interval for the nofeedback timer.
* We use a configuration option to increase the lower bound.
* This can help avoid triggering the nofeedback timer too
* often ('spinning') on LANs with small RTTs.
*/
hctx->ccid3hctx_t_rto = max_t(u32, 4 * hctx->ccid3hctx_rtt,
CONFIG_IP_DCCP_CCID3_RTO *
(USEC_PER_SEC/1000));
/*
* Schedule no feedback timer to expire in
* max(t_RTO, 2 * s/X) = max(t_RTO, 2 * t_ipi)
*/
t_nfb = max(hctx->ccid3hctx_t_rto, 2 * hctx->ccid3hctx_t_ipi);
ccid3_pr_debug("%s(%p), Scheduled no feedback timer to "
"expire in %lu jiffies (%luus)\n",
dccp_role(sk),
sk, usecs_to_jiffies(t_nfb), t_nfb);
sk_reset_timer(sk, &hctx->ccid3hctx_no_feedback_timer,
jiffies + usecs_to_jiffies(t_nfb));
Maybe the TCP code could do this also (with a sysctl to turn behaviour
off and on) and then it would save system administrators having to
"tune" the TCP stack if they want this sort of behaviour.
Ian
--
Web1: http://wand.net.nz/~iam4/
Web2: http://www.jandi.co.nz
Blog: http://iansblog.jandi.co.nz
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists