netdev - Re: [PATCH] tcp: bound RTO to minimum

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 25 Aug 2011 07:28:54 +0200
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Yuchung Cheng <ycheng@...gle.com>
Cc:	Hagen Paul Pfeifer <hagen@...u.net>, netdev@...r.kernel.org
Subject: Re: [PATCH] tcp: bound RTO to minimum

Le mercredi 24 août 2011 à 18:50 -0700, Yuchung Cheng a écrit :
> On Wed, Aug 24, 2011 at 4:41 PM, Hagen Paul Pfeifer <hagen@...u.net> wrote:
> > Check if calculated RTO is less then TCP_RTO_MIN. If this is true we
> > adjust the value to TCP_RTO_MIN.
> >
> but tp->rttvar is already lower-bounded via tcp_rto_min()?
> 
> static inline void tcp_set_rto(struct sock *sk)
> {
> ...
> 
>   /* NOTE: clamping at TCP_RTO_MIN is not required, current algo
>    * guarantees that rto is higher.
>    */
>   tcp_bound_rto(sk);
> }

Yes, and furthermore, we also limit ICMP rate, so in in my tests, I
reach in a few rounds icsk_rto > 1sec

07:16:13.010633 IP 10.2.1.2.59352 > 10.2.1.1.ssh: P 3833540215:3833540263(48) ack 2593537670 win 305
07:16:13.221111 IP 10.2.1.2.59352 > 10.2.1.1.ssh: P 0:48(48) ack 1 win 305 
07:16:13.661151 IP 10.2.1.2.59352 > 10.2.1.1.ssh: P 0:48(48) ack 1 win 305 
07:16:14.541153 IP 10.2.1.2.59352 > 10.2.1.1.ssh: P 0:48(48) ack 1 win 305 
07:16:16.301152 IP 10.2.1.2.59352 > 10.2.1.1.ssh: P 0:48(48) ack 1 win 305 
<from this point, icsk_rto=1.76sec >
07:16:18.061158 IP 10.2.1.2.59352 > 10.2.1.1.ssh: P 0:48(48) ack 1 win 305 
07:16:19.821158 IP 10.2.1.2.59352 > 10.2.1.1.ssh: P 0:48(48) ack 1 win 305 
07:16:21.581018 IP 10.2.1.2.59352 > 10.2.1.1.ssh: P 0:48(48) ack 1 win 305 
07:16:23.341156 IP 10.2.1.2.59352 > 10.2.1.1.ssh: P 0:48(48) ack 1 win 305 
07:16:25.101151 IP 10.2.1.2.59352 > 10.2.1.1.ssh: P 0:48(48) ack 1 win 305 
07:16:26.861155 IP 10.2.1.2.59352 > 10.2.1.1.ssh: P 0:48(48) ack 1 win 305 
07:16:28.621158 IP 10.2.1.2.59352 > 10.2.1.1.ssh: P 0:48(48) ack 1 win 305 
07:16:30.381152 IP 10.2.1.2.59352 > 10.2.1.1.ssh: P 0:48(48) ack 1 win 305 
07:16:32.141157 IP 10.2.1.2.59352 > 10.2.1.1.ssh: P 0:48(48) ack 1 win 305 

Real question is : do we really want to process ~1000 timer interrupts
per tcp session, ~2000 skb alloc/free/build/handling, possibly ~1000 ARP
requests, only to make tcp revover in ~1sec when connectivity returns
back. This just doesnt scale.

On a server handling ~1.000.000 (long living) sessions, using
application side keepalives (say one message sent every minute on each
session), a temporary connectivity disruption _could_ makes it enter a
critical zone, burning cpu and memory.

It seems TCP-LCD (RFC6069) depends very much on ICMP being rate limited.

I'll have to check what happens on multiple sessions : We might have
cpus fighting on a single inetpeer and throtle, thus allowing backoff to
increase after all. 



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html