lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 18 Nov 2017 00:04:38 -0500
From:   Soheil Hassas Yeganeh <soheil.kdev@...il.com>
To:     Neal Cardwell <ncardwell@...gle.com>
Cc:     David Miller <davem@...emloft.net>,
        netdev <netdev@...r.kernel.org>,
        Yuchung Cheng <ycheng@...gle.com>,
        Eric Dumazet <edumazet@...gle.com>
Subject: Re: [PATCH net] tcp: when scheduling TLP, time of RTO should account
 for current ACK

On Fri, Nov 17, 2017 at 9:06 PM, Neal Cardwell <ncardwell@...gle.com> wrote:
>
> Fix the TLP scheduling logic so that when scheduling a TLP probe, we
> ensure that the estimated time at which an RTO would fire accounts for
> the fact that ACKs indicating forward progress should push back RTO
> times.
>
> After the following fix:
>
> df92c8394e6e ("tcp: fix xmit timer to only be reset if data ACKed/SACKed")
>
> we had an unintentional behavior change in the following kind of
> scenario: suppose the RTT variance has been very low recently. Then
> suppose we send out a flight of N packets and our RTT is 100ms:
>
> t=0: send a flight of N packets
> t=100ms: receive an ACK for N-1 packets
>
> The response before df92c8394e6e that was:
>   -> schedule a TLP for now + RTO_interval
>
> The response after df92c8394e6e is:
>   -> schedule a TLP for t=0 + RTO_interval
>
> Since RTO_interval = srtt + RTT_variance, this means that we have
> scheduled a TLP timer at a point in the future that only accounts for
> RTT_variance. If the RTT_variance term is small, this means that the
> timer fires soon.
>
> Before df92c8394e6e this would not happen, because in that code, when
> we receive an ACK for a prefix of flight, we did:
>
>     1) Near the top of tcp_ack(), switch from TLP timer to RTO
>        at write_queue_head->paket_tx_time + RTO_interval:
>             if (icsk->icsk_pending == ICSK_TIME_LOSS_PROBE)
>                    tcp_rearm_rto(sk);
>
>     2) In tcp_clean_rtx_queue(), update the RTO to now + RTO_interval:
>             if (flag & FLAG_ACKED) {
>                    tcp_rearm_rto(sk);
>
>     3) In tcp_ack() after tcp_fastretrans_alert() switch from RTO
>        to TLP at now + RTO_interval:
>             if (icsk->icsk_pending == ICSK_TIME_RETRANS)
>                    tcp_schedule_loss_probe(sk);
>
> In df92c8394e6e we removed that 3-phase dance, and instead directly
> set the TLP timer once: we set the TLP timer in cases like this to
> write_queue_head->packet_tx_time + RTO_interval. So if the RTT
> variance is small, then this means that this is setting the TLP timer
> to fire quite soon. This means if the ACK for the tail of the flight
> takes longer than an RTT to arrive (often due to delayed ACKs), then
> the TLP timer fires too quickly.
>
> Fixes: df92c8394e6e ("tcp: fix xmit timer to only be reset if data ACKed/SACKed")
> Signed-off-by: Neal Cardwell <ncardwell@...gle.com>
> Signed-off-by: Yuchung Cheng <ycheng@...gle.com>
> Signed-off-by: Eric Dumazet <edumazet@...gle.com>
Acked-by: Soheil Hassas Yeganeh <soheil@...gle.com>

Nice fix. Thank you, Neal!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ