netdev - Re: [PATCH net] tcp: prevent bogus FRTO undos with non-SACK flows

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADVnQy=1UWUNk-J5=ytVLUoivL7OG0uYh_gJfL1mNLcTr+zuBg@mail.gmail.com>
Date:   Fri, 29 Jun 2018 10:31:36 -0400
From:   Neal Cardwell <ncardwell@...gle.com>
To:     Ilpo Järvinen <ilpo.jarvinen@...sinki.fi>
Cc:     Netdev <netdev@...r.kernel.org>, Yuchung Cheng <ycheng@...gle.com>,
        Eric Dumazet <edumazet@...gle.com>,
        Michal Kubecek <mkubecek@...e.cz>
Subject: Re: [PATCH net] tcp: prevent bogus FRTO undos with non-SACK flows

On Fri, Jun 29, 2018 at 6:07 AM Ilpo Järvinen <ilpo.jarvinen@...sinki.fi> wrote:
>
> If SACK is not enabled and the first cumulative ACK after the RTO
> retransmission covers more than the retransmitted skb, a spurious
> FRTO undo will trigger (assuming FRTO is enabled for that RTO).
> The reason is that any non-retransmitted segment acknowledged will
> set FLAG_ORIG_SACK_ACKED in tcp_clean_rtx_queue even if there is
> no indication that it would have been delivered for real (the
> scoreboard is not kept with TCPCB_SACKED_ACKED bits in the non-SACK
> case so the check for that bit won't help like it does with SACK).
> Having FLAG_ORIG_SACK_ACKED set results in the spurious FRTO undo
> in tcp_process_loss.
>
> We need to use more strict condition for non-SACK case and check
> that none of the cumulatively ACKed segments were retransmitted
> to prove that progress is due to original transmissions. Only then
> keep FLAG_ORIG_SACK_ACKED set, allowing FRTO undo to proceed in
> non-SACK case.
>
> (FLAG_ORIG_SACK_ACKED is planned to be renamed to FLAG_ORIG_PROGRESS
> to better indicate its purpose but to keep this change minimal, it
> will be done in another patch).
>
> Besides burstiness and congestion control violations, this problem
> can result in RTO loop: When the loss recovery is prematurely
> undoed, only new data will be transmitted (if available) and
> the next retransmission can occur only after a new RTO which in case
> of multiple losses (that are not for consecutive packets) requires
> one RTO per loss to recover.
>
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@...sinki.fi>
> ---
>  net/ipv4/tcp_input.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 045d930..8e5522c 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -3181,6 +3181,15 @@ static int tcp_clean_rtx_queue(struct sock *sk, u32 prior_fack,
>
>                 if (tcp_is_reno(tp)) {
>                         tcp_remove_reno_sacks(sk, pkts_acked);
> +
> +                       /* If any of the cumulatively ACKed segments was
> +                        * retransmitted, non-SACK case cannot confirm that
> +                        * progress was due to original transmission due to
> +                        * lack of TCPCB_SACKED_ACKED bits even if some of
> +                        * the packets may have been never retransmitted.
> +                        */
> +                       if (flag & FLAG_RETRANS_DATA_ACKED)
> +                               flag &= ~FLAG_ORIG_SACK_ACKED;
>                 } else {
>                         int delta;

Thanks, Ilpo. I ran this through our TCP packetdrill tests and only
got one failure, which detected the change you made, so that on the
first ACK in a non-SACK F-RTO case we stay in CA_Loss.

However, depending on the exact scenario we are concerned about here,
this may not be a complete solution. Consider the packetdrill test I
cooked below, which is for a scenario where we imagine a non-SACK
connection, with data packet #1 and all dupacks lost. With your patch
we don't have an F-RTO undo on the first cumulative ACK, which is a
definite improvement. However, we end up with an F-RTO undo on the
*second* cumulative ACK, because the second ACK didn't cover any
retransmitted data. That means that we end up in the second round trip
back in the initial slow-start mode with a very high cwnd and infinite
ssthresh, even though data was actually lost in the first round trip.

I'm not sure how much we care about all these cases. Perhaps we should
just disable F-RTO if the connection has no SACK enabled? These
non-SACK connections are just such a rare case at this point that I
would propose it's not worth spending too much time on this.

The following packetdrill script passes when Ilpo's patch is applied.
This documents the behavior, which I think is questionable:

    0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
   +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
   +0 bind(3, ..., ...) = 0
   +0 listen(3, 1) = 0

   +0 < S 0:0(0) win 32792 <mss 1000,nop,wscale 7>
   +0 > S. 0:0(0) ack 1 <mss 1460,nop,wscale 8>
 +.02 < . 1:1(0) ack 1 win 257
   +0 accept(3, ..., ...) = 4
   +0 write(4, ..., 27000) = 27000
   +0 > . 1:10001(10000) ack 1

// Suppose 1:1001 is lost and all dupacks are lost.

// RTO and retransmit head. This fills a real hole.
 +.22 > . 1:1001(1000) ack 1

+.005 < . 1:1(0) ack 2001 win 257
   +0 > . 10001:13001(3000) ack 1
   +0 %{ assert tcpi_ca_state == TCP_CA_Loss, tcpi_ca_state }%
   +0 %{ assert tcpi_snd_cwnd == 3, tcpi_snd_cwnd }%
   +0 %{ assert tcpi_snd_ssthresh == 7, tcpi_snd_ssthresh }%

+.002 < . 1:1(0) ack 10001 win 257
   +0 %{ assert tcpi_ca_state == TCP_CA_Open, tcpi_ca_state }%
   +0 %{ assert tcpi_snd_cwnd == 18, tcpi_snd_cwnd }%
   +0 %{ assert tcpi_snd_ssthresh == 2147483647, tcpi_snd_ssthresh }%
   +0 > . 13001:15001(2000) ack 1

---

neal