netdev - Re: [PATCH] make _minimum_ TCP retransmission timeout configurable

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5640c7e00708291510p778f387w51d50e981ba49a25@mail.gmail.com>
Date:	Thu, 30 Aug 2007 10:10:37 +1200
From:	"Ian McDonald" <ian.mcdonald@...di.co.nz>
To:	"David Miller" <davem@...emloft.net>
Cc:	rick.jones2@...com, netdev@...r.kernel.org
Subject: Re: [PATCH] make _minimum_ TCP retransmission timeout configurable

On 8/30/07, David Miller <davem@...emloft.net> wrote:
> From: "Ian McDonald" <ian.mcdonald@...di.co.nz>
> Date: Thu, 30 Aug 2007 09:32:38 +1200
>
> > So I'm suspecting that the default should be changed to 1000 to match
> > the RFC which would solve this issue. I note that the RFC is a SHOULD
> > rather than a MUST. I had a quick look around and not sure why Linux
> > overrides the RFC on this one.
>
> Everyone uses this value, even BSD since ancient times.
>
> None of the research folks want to commit to saying a lower value is
> OK, even though it's quite clear that on a local 10 gigabit link a
> minimum value of even 200 is absolutely and positively absurd.
>
Understand what you are saying. That is why I questioned as 200 msecs
makes no sense on a LAN with < 1 msec RTT. So if the current is
ridiculous and 1000 is even more so, why do we use? Just because that
is how TCP is written I'm guessing.

I know that in DCCP CCID3 the RTO is 4 x RTT (from memory - it might
be a slight variation) but we ended up putting a minimum on it as you
also face a problem if it fires too frequently (i.e. link is in
usecs).

I might ask around on research lists and see why this issue has never
been revisited.

Now to the original issue - high RTT links. If that is an issue, and I
believe it would be, then it's probably better to do this on a per
route basis or similar, although then we're becoming a defacto X x rtt
type setup. Rereading the RFC this actually doesn't seem prohibited
and here is the code from DCCP CCID3 that we use:

		/*
		 * Update timeout interval for the nofeedback timer.
		 * We use a configuration option to increase the lower bound.
		 * This can help avoid triggering the nofeedback timer too
		 * often ('spinning') on LANs with small RTTs.
		 */
		hctx->ccid3hctx_t_rto = max_t(u32, 4 * hctx->ccid3hctx_rtt,
						   CONFIG_IP_DCCP_CCID3_RTO *
						   (USEC_PER_SEC/1000));
		/*
		 * Schedule no feedback timer to expire in
		 * max(t_RTO, 2 * s/X)  =  max(t_RTO, 2 * t_ipi)
		 */
		t_nfb = max(hctx->ccid3hctx_t_rto, 2 * hctx->ccid3hctx_t_ipi);

		ccid3_pr_debug("%s(%p), Scheduled no feedback timer to "
			       "expire in %lu jiffies (%luus)\n",
			       dccp_role(sk),
			       sk, usecs_to_jiffies(t_nfb), t_nfb);

		sk_reset_timer(sk, &hctx->ccid3hctx_no_feedback_timer,
				   jiffies + usecs_to_jiffies(t_nfb));

Maybe the TCP code could do this also (with a sysctl to turn behaviour
off and on) and then it would save system administrators having to
"tune" the TCP stack if they want this sort of behaviour.

Ian
-- 
Web1: http://wand.net.nz/~iam4/
Web2: http://www.jandi.co.nz
Blog: http://iansblog.jandi.co.nz
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html