netdev - Re: [PATCH net] tcp: prevent bogus FRTO undos with non-SACK flows

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 29 Jun 2018 22:52:14 +0300 (EEST)
From:   Ilpo Järvinen <ilpo.jarvinen@...sinki.fi>
To:     Neal Cardwell <ncardwell@...gle.com>
cc:     Netdev <netdev@...r.kernel.org>, Yuchung Cheng <ycheng@...gle.com>,
        Eric Dumazet <edumazet@...gle.com>,
        Michal Kubecek <mkubecek@...e.cz>
Subject: Re: [PATCH net] tcp: prevent bogus FRTO undos with non-SACK flows

On Fri, 29 Jun 2018, Neal Cardwell wrote:

> On Fri, Jun 29, 2018 at 6:07 AM Ilpo Järvinen <ilpo.jarvinen@...sinki.fi> wrote:
> >
> > If SACK is not enabled and the first cumulative ACK after the RTO
> > retransmission covers more than the retransmitted skb, a spurious
> > FRTO undo will trigger (assuming FRTO is enabled for that RTO).
> > The reason is that any non-retransmitted segment acknowledged will
> > set FLAG_ORIG_SACK_ACKED in tcp_clean_rtx_queue even if there is
> > no indication that it would have been delivered for real (the
> > scoreboard is not kept with TCPCB_SACKED_ACKED bits in the non-SACK
> > case so the check for that bit won't help like it does with SACK).
> > Having FLAG_ORIG_SACK_ACKED set results in the spurious FRTO undo
> > in tcp_process_loss.
> >
> > We need to use more strict condition for non-SACK case and check
> > that none of the cumulatively ACKed segments were retransmitted
> > to prove that progress is due to original transmissions. Only then
> > keep FLAG_ORIG_SACK_ACKED set, allowing FRTO undo to proceed in
> > non-SACK case.
> >
> > (FLAG_ORIG_SACK_ACKED is planned to be renamed to FLAG_ORIG_PROGRESS
> > to better indicate its purpose but to keep this change minimal, it
> > will be done in another patch).
> >
> > Besides burstiness and congestion control violations, this problem
> > can result in RTO loop: When the loss recovery is prematurely
> > undoed, only new data will be transmitted (if available) and
> > the next retransmission can occur only after a new RTO which in case
> > of multiple losses (that are not for consecutive packets) requires
> > one RTO per loss to recover.
> >
> > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@...sinki.fi>
> > ---
> >  net/ipv4/tcp_input.c | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> >
> > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > index 045d930..8e5522c 100644
> > --- a/net/ipv4/tcp_input.c
> > +++ b/net/ipv4/tcp_input.c
> > @@ -3181,6 +3181,15 @@ static int tcp_clean_rtx_queue(struct sock *sk, u32 prior_fack,
> >
> >                 if (tcp_is_reno(tp)) {
> >                         tcp_remove_reno_sacks(sk, pkts_acked);
> > +
> > +                       /* If any of the cumulatively ACKed segments was
> > +                        * retransmitted, non-SACK case cannot confirm that
> > +                        * progress was due to original transmission due to
> > +                        * lack of TCPCB_SACKED_ACKED bits even if some of
> > +                        * the packets may have been never retransmitted.
> > +                        */
> > +                       if (flag & FLAG_RETRANS_DATA_ACKED)
> > +                               flag &= ~FLAG_ORIG_SACK_ACKED;
> >                 } else {
> >                         int delta;
> 
> Thanks, Ilpo. I ran this through our TCP packetdrill tests and only
> got one failure, which detected the change you made, so that on the
> first ACK in a non-SACK F-RTO case we stay in CA_Loss.

Great, thanks for testing.

> However, depending on the exact scenario we are concerned about here,
> this may not be a complete solution. Consider the packetdrill test I
> cooked below, which is for a scenario where we imagine a non-SACK
> connection, with data packet #1 and all dupacks lost. With your patch
> we don't have an F-RTO undo on the first cumulative ACK, which is a
> definite improvement. However, we end up with an F-RTO undo on the
> *second* cumulative ACK, because the second ACK didn't cover any
> retransmitted data. That means that we end up in the second round trip
> back in the initial slow-start mode with a very high cwnd and infinite
> ssthresh, even though data was actually lost in the first round trip.
>
> I'm not sure how much we care about all these cases. Perhaps we should
> just disable F-RTO if the connection has no SACK enabled? These
> non-SACK connections are just such a rare case at this point that I
> would propose it's not worth spending too much time on this.
> 
> The following packetdrill script passes when Ilpo's patch is applied.
> This documents the behavior, which I think is questionable:
> 
>     0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
>    +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
>    +0 bind(3, ..., ...) = 0
>    +0 listen(3, 1) = 0
> 
>    +0 < S 0:0(0) win 32792 <mss 1000,nop,wscale 7>
>    +0 > S. 0:0(0) ack 1 <mss 1460,nop,wscale 8>
>  +.02 < . 1:1(0) ack 1 win 257
>    +0 accept(3, ..., ...) = 4
>    +0 write(4, ..., 27000) = 27000
>    +0 > . 1:10001(10000) ack 1
> 
> // Suppose 1:1001 is lost and all dupacks are lost.
> 
> // RTO and retransmit head. This fills a real hole.
>  +.22 > . 1:1001(1000) ack 1
> 
> +.005 < . 1:1(0) ack 2001 win 257

Why did the receiver send a cumulative ACK only for 2001?

>    +0 > . 10001:13001(3000) ack 1
>    +0 %{ assert tcpi_ca_state == TCP_CA_Loss, tcpi_ca_state }%
>    +0 %{ assert tcpi_snd_cwnd == 3, tcpi_snd_cwnd }%
>    +0 %{ assert tcpi_snd_ssthresh == 7, tcpi_snd_ssthresh }%
> 
> +.002 < . 1:1(0) ack 10001 win 257

...And then magically all packets were received up to 10001 here as we
get cumulative ACK that far? None were lost I guess? Were they perhaps 
reordered past RTO?

>    +0 %{ assert tcpi_ca_state == TCP_CA_Open, tcpi_ca_state }%
>    +0 %{ assert tcpi_snd_cwnd == 18, tcpi_snd_cwnd }%
>    +0 %{ assert tcpi_snd_ssthresh == 2147483647, tcpi_snd_ssthresh }%
>    +0 > . 13001:15001(2000) ack 1

This looks to me just another "lets defeat the heuristics" pattern rather 
than what a realistic receiver / properly functioning "middlebox" should 
do, or do I miss something? I think that the power/ability with 
packetdrill sets a trap for us we need to be careful not to step into: It 
is very possible to overdo scenarios with packetdrill to a point where 
they stop being realistic.


When we go to this road, we might also look into FRTO with SACK with >1 
hole and all ACKs lost. I think it will trigger exactly the same issue my 
patch attempts to fix for non-SACK cases because scoreboard won't get 
S-bits from those non-arriving ACKs (it could also trigger the scenario 
you describe). ...I guess it could be fixed by limiting the undo check 
really into step 3 rather than possibly invoking it already in step 2, 
which is a subtle difference with the current Linux implementation and 
FRTO RFC. But I want to state (now that I bring this scenario up), that I 
think this is much much less likely to trigger than the issue my fix 
addresses (very possible to be true even considering the huge bias in # of 
SACK enabled vs non-SACK flow).

...Beyond that one, I'm yet to see/come up a convincing scenario due to 
FRTO heuristics.


-- 
 i.