[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.20.1806292220150.26571@whs-18.cs.helsinki.fi>
Date: Fri, 29 Jun 2018 22:52:14 +0300 (EEST)
From: Ilpo Järvinen <ilpo.jarvinen@...sinki.fi>
To: Neal Cardwell <ncardwell@...gle.com>
cc: Netdev <netdev@...r.kernel.org>, Yuchung Cheng <ycheng@...gle.com>,
Eric Dumazet <edumazet@...gle.com>,
Michal Kubecek <mkubecek@...e.cz>
Subject: Re: [PATCH net] tcp: prevent bogus FRTO undos with non-SACK flows
On Fri, 29 Jun 2018, Neal Cardwell wrote:
> On Fri, Jun 29, 2018 at 6:07 AM Ilpo Järvinen <ilpo.jarvinen@...sinki.fi> wrote:
> >
> > If SACK is not enabled and the first cumulative ACK after the RTO
> > retransmission covers more than the retransmitted skb, a spurious
> > FRTO undo will trigger (assuming FRTO is enabled for that RTO).
> > The reason is that any non-retransmitted segment acknowledged will
> > set FLAG_ORIG_SACK_ACKED in tcp_clean_rtx_queue even if there is
> > no indication that it would have been delivered for real (the
> > scoreboard is not kept with TCPCB_SACKED_ACKED bits in the non-SACK
> > case so the check for that bit won't help like it does with SACK).
> > Having FLAG_ORIG_SACK_ACKED set results in the spurious FRTO undo
> > in tcp_process_loss.
> >
> > We need to use more strict condition for non-SACK case and check
> > that none of the cumulatively ACKed segments were retransmitted
> > to prove that progress is due to original transmissions. Only then
> > keep FLAG_ORIG_SACK_ACKED set, allowing FRTO undo to proceed in
> > non-SACK case.
> >
> > (FLAG_ORIG_SACK_ACKED is planned to be renamed to FLAG_ORIG_PROGRESS
> > to better indicate its purpose but to keep this change minimal, it
> > will be done in another patch).
> >
> > Besides burstiness and congestion control violations, this problem
> > can result in RTO loop: When the loss recovery is prematurely
> > undoed, only new data will be transmitted (if available) and
> > the next retransmission can occur only after a new RTO which in case
> > of multiple losses (that are not for consecutive packets) requires
> > one RTO per loss to recover.
> >
> > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@...sinki.fi>
> > ---
> > net/ipv4/tcp_input.c | 9 +++++++++
> > 1 file changed, 9 insertions(+)
> >
> > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > index 045d930..8e5522c 100644
> > --- a/net/ipv4/tcp_input.c
> > +++ b/net/ipv4/tcp_input.c
> > @@ -3181,6 +3181,15 @@ static int tcp_clean_rtx_queue(struct sock *sk, u32 prior_fack,
> >
> > if (tcp_is_reno(tp)) {
> > tcp_remove_reno_sacks(sk, pkts_acked);
> > +
> > + /* If any of the cumulatively ACKed segments was
> > + * retransmitted, non-SACK case cannot confirm that
> > + * progress was due to original transmission due to
> > + * lack of TCPCB_SACKED_ACKED bits even if some of
> > + * the packets may have been never retransmitted.
> > + */
> > + if (flag & FLAG_RETRANS_DATA_ACKED)
> > + flag &= ~FLAG_ORIG_SACK_ACKED;
> > } else {
> > int delta;
>
> Thanks, Ilpo. I ran this through our TCP packetdrill tests and only
> got one failure, which detected the change you made, so that on the
> first ACK in a non-SACK F-RTO case we stay in CA_Loss.
Great, thanks for testing.
> However, depending on the exact scenario we are concerned about here,
> this may not be a complete solution. Consider the packetdrill test I
> cooked below, which is for a scenario where we imagine a non-SACK
> connection, with data packet #1 and all dupacks lost. With your patch
> we don't have an F-RTO undo on the first cumulative ACK, which is a
> definite improvement. However, we end up with an F-RTO undo on the
> *second* cumulative ACK, because the second ACK didn't cover any
> retransmitted data. That means that we end up in the second round trip
> back in the initial slow-start mode with a very high cwnd and infinite
> ssthresh, even though data was actually lost in the first round trip.
>
> I'm not sure how much we care about all these cases. Perhaps we should
> just disable F-RTO if the connection has no SACK enabled? These
> non-SACK connections are just such a rare case at this point that I
> would propose it's not worth spending too much time on this.
>
> The following packetdrill script passes when Ilpo's patch is applied.
> This documents the behavior, which I think is questionable:
>
> 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
> +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
> +0 bind(3, ..., ...) = 0
> +0 listen(3, 1) = 0
>
> +0 < S 0:0(0) win 32792 <mss 1000,nop,wscale 7>
> +0 > S. 0:0(0) ack 1 <mss 1460,nop,wscale 8>
> +.02 < . 1:1(0) ack 1 win 257
> +0 accept(3, ..., ...) = 4
> +0 write(4, ..., 27000) = 27000
> +0 > . 1:10001(10000) ack 1
>
> // Suppose 1:1001 is lost and all dupacks are lost.
>
> // RTO and retransmit head. This fills a real hole.
> +.22 > . 1:1001(1000) ack 1
>
> +.005 < . 1:1(0) ack 2001 win 257
Why did the receiver send a cumulative ACK only for 2001?
> +0 > . 10001:13001(3000) ack 1
> +0 %{ assert tcpi_ca_state == TCP_CA_Loss, tcpi_ca_state }%
> +0 %{ assert tcpi_snd_cwnd == 3, tcpi_snd_cwnd }%
> +0 %{ assert tcpi_snd_ssthresh == 7, tcpi_snd_ssthresh }%
>
> +.002 < . 1:1(0) ack 10001 win 257
...And then magically all packets were received up to 10001 here as we
get cumulative ACK that far? None were lost I guess? Were they perhaps
reordered past RTO?
> +0 %{ assert tcpi_ca_state == TCP_CA_Open, tcpi_ca_state }%
> +0 %{ assert tcpi_snd_cwnd == 18, tcpi_snd_cwnd }%
> +0 %{ assert tcpi_snd_ssthresh == 2147483647, tcpi_snd_ssthresh }%
> +0 > . 13001:15001(2000) ack 1
This looks to me just another "lets defeat the heuristics" pattern rather
than what a realistic receiver / properly functioning "middlebox" should
do, or do I miss something? I think that the power/ability with
packetdrill sets a trap for us we need to be careful not to step into: It
is very possible to overdo scenarios with packetdrill to a point where
they stop being realistic.
When we go to this road, we might also look into FRTO with SACK with >1
hole and all ACKs lost. I think it will trigger exactly the same issue my
patch attempts to fix for non-SACK cases because scoreboard won't get
S-bits from those non-arriving ACKs (it could also trigger the scenario
you describe). ...I guess it could be fixed by limiting the undo check
really into step 3 rather than possibly invoking it already in step 2,
which is a subtle difference with the current Linux implementation and
FRTO RFC. But I want to state (now that I bring this scenario up), that I
think this is much much less likely to trigger than the issue my fix
addresses (very possible to be true even considering the huge bias in # of
SACK enabled vs non-SACK flow).
...Beyond that one, I'm yet to see/come up a convincing scenario due to
FRTO heuristics.
--
i.
Powered by blists - more mailing lists