netdev - Re: [PATCH] tcp: bound RTO to minimum

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4E5619DA.6070902@arndnet.de>
Date:	Thu, 25 Aug 2011 11:46:02 +0200
From:	Arnd Hannemann <arnd@...dnet.de>
To:	Eric Dumazet <eric.dumazet@...il.com>
CC:	Alexander Zimmermann <alexander.zimmermann@...sys.rwth-aachen.de>,
	Yuchung Cheng <ycheng@...gle.com>,
	Hagen Paul Pfeifer <hagen@...u.net>,
	netdev <netdev@...r.kernel.org>,
	Lukowski Damian <damian@....rwth-aachen.de>
Subject: Re: [PATCH] tcp: bound RTO to minimum

Hi Eric,

Am 25.08.2011 11:09, schrieb Eric Dumazet:
> Le jeudi 25 août 2011 à 10:46 +0200, Arnd Hannemann a écrit :
>> Am 25.08.2011 10:26, schrieb Eric Dumazet:
>>> Le jeudi 25 août 2011 à 09:28 +0200, Alexander Zimmermann a écrit :
>>>> Am 25.08.2011 um 07:28 schrieb Eric Dumazet:
>>>
>>>>> Real question is : do we really want to process ~1000 timer interrupts
>>>>> per tcp session, ~2000 skb alloc/free/build/handling, possibly ~1000 ARP
>>>>> requests, only to make tcp revover in ~1sec when connectivity returns
>>>>> back. This just doesnt scale.
>>>>
>>>> maybe a stupid question, but 1000?. With an minRTO of 200ms and a maximum
>>>> probing time of 120s, we 600 retransmits in a worst-case-senario
>>>> (assumed that we get for every rot retransmission an icmp). No?
>>>
>>> Where is asserted the "max probing time of 120s" ? 
>>>
>>> It is not the case on my machine :
>>> I have way more retransmits than that, even if spaced by 1600 ms
>>>
>>> 07:16:13.389331 write(3, "\350F\235JC\357\376\363&\3\374\270R\21L\26\324{\37p\342\244i\304\356\241I:\301\332\222\26"..., 48) = 48
>>> 07:16:13.389417 select(7, [3 4], [], NULL, NULL) = 1 (in [3])
>>> 07:31:39.901311 read(3, 0xff8c4c90, 8192) = -1 EHOSTUNREACH (No route to host)
>>>
>>> Old kernels where performing up to 15 retries, doing exponential backoff.
>>>
>>> Now its kind of unlimited, according to experimental results.
>>
>> That shouldn't be. It should stop after the same time a TCP connection with an
>> RTO of Minimum RTO which is doing 15 retries (tcp_retries2=15) and doing exponential backoff.
>> So it should be around 900s*. But it could be that because of the icsk_retransmit wrapover
>> this doesn't work as expected.
>>
>> * 200ms + 400ms + 800ms ...
> 
> It is 924 second with retries2=15 (default value)
> 
> I said ~1000 probes.
> 
> If ICMP are not rate limited, that could be about 924*5 probes, instead
> of 15 probes on old kernels.

At a rate of 5 packets/s if RTT is zero, yes. I would like to say: so
what? But your example with millions of idle connections stands.

> Maybe we should refine the thing a bit, to not reverse backoff unless
> rto is > some_threshold.
> 
> Say 10s being the value, that would give at most 92 tries.

I personally think that 10s would be too large and eliminate the benefit of the
algorithm, so I would prefer a different solution.

In case of one bulk data TCP session, which was transmitting hundreds of packets/s
before the connectivity disruption those worst case rate of 5 packet/s really
seems conservative enough.

However in case of a lot of idle connections, which were transmitting only
a number of packets per minute. We might increase the rate drastically for
a certain period until it throttles down. You say that we have a problem here
correct?

Do you think it would be possible without much hassle to use a kind of "global"
rate limiting only for these probe packets of a TCP connection?

> I mean, what is the gain to be able to restart a frozen TCP session with
> a 1sec latency instead of 10s if it was blocked more than 60 seconds ?

I'm afraid it does a lot, especially in highly dynamic environments. You
don't have just the additional latency, you may actually miss the full
period where connectivity was there, and then just retransmit into the next
connectivity disrupted period.

Best regards,
Arnd



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html