netdev - Re: [PATCH net-next] tcp: Optimize the recovery of tcp when lack of SACK

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADVnQymfO7EDqUvhtE=n=AGmND1ajfUPzcPLR6wB7PBVzYRSZA@mail.gmail.com>
Date:   Sat, 18 Jul 2020 23:29:34 -0400
From:   Neal Cardwell <ncardwell@...gle.com>
To:     hujunwei <hujunwei4@...wei.com>
Cc:     Eric Dumazet <edumazet@...gle.com>,
        David Miller <davem@...emloft.net>,
        Alexey Kuznetsov <kuznet@....inr.ac.ru>,
        Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
        Jakub Kicinski <kuba@...nel.org>,
        Netdev <netdev@...r.kernel.org>, wangxiaogang3@...wei.com,
        jinyiting@...wei.com, xuhanbing@...wei.com, zhengshaoyu@...wei.com,
        Yuchung Cheng <ycheng@...gle.com>,
        Ilpo Jarvinen <ilpo.jarvinen@...helsinki.fi>
Subject: Re: [PATCH net-next] tcp: Optimize the recovery of tcp when lack of SACK

On Sat, Jul 18, 2020 at 6:43 AM hujunwei <hujunwei4@...wei.com> wrote:
>
>
> On 2020/7/17 22:44, Neal Cardwell wrote:
> > On Fri, Jul 17, 2020 at 7:43 AM hujunwei <hujunwei4@...wei.com> wrote:
> >>
> >> From: Junwei Hu <hujunwei4@...wei.com>
> >>
> >> In the document of RFC2582(https://tools.ietf.org/html/rfc2582)
> >> introduced two separate scenarios for tcp congestion control:
> >
> > Can you please elaborate on how the sender is able to distinguish
> > between the two scenarios, after your patch?
> >
> > It seems to me that with this proposed patch, there is the risk of
> > spurious fast recoveries due to 3 dupacks in the second second
> > scenario (the sender unnecessarily retransmitted three packets below
> > "send_high"). Can you please post a packetdrill test to demonstrate
> > that with this patch the TCP sender does not spuriously enter fast
> > recovery in such a scenario?
> >
> Hi neal,
> Thanks for you quick reply!
> What I want to says is when these three numbers: snd_una, high_seq and
> snd_nxt are the same, that means all data outstanding
> when the Loss state began have successfully been acknowledged.

Yes, that seems true.

> So the sender is time to exits to the Open state.
> I'm not sure whether my understanding is correct.

I don't think we want the sender to exit to the CA_Open state in these
circumstances. I think section 5 ("Avoiding Multiple Fast
Retransmits") of RFC 2582 states convincingly that senders should take
steps to avoid having duplicate acknowledgements at high_seq trigger a
new fast recovery. The Linux TCP implements those steps by *not*
exiting to the Open state, and instead staying in CA_Loss or
CA_Recovery.

To make things more concrete, here is the kind of timeline/scenario I
am concerned about with your proposed patch. I have not had cycles to
cook a packetdrill test like this, but this is the basic idea:

[connection does not have SACK or TCP timestamps enabled]
app writes 4*SMSS
Send packets P1, P2, P3, P4
TLP, spurious retransmit of P4
spurious RTO, set cwnd to 1, enter CA_Loss, retransmit P1
receive ACK for P1 (original copy)
slow-start, increase cwnd to 2, retransmit P2, P3
receive ACK for P2 (original copy)
slow-start, increase cwnd to 3, retransmit P4
receive ACK for P3 (original copy)
slow-start, increase cwnd to 4
receive ACK for P4 (original copy)
slow-start, increase cwnd to 5
[with your patch, at this point the sender does not meet the
 conditions for "Hold old state until something *above* high_seq is ACKed.",
 so sender exits CA_Loss and enters Open]
app writes 4*MSS
send P5, P6, P7, P8
receive dupack for P4 (due to spurious TLP retransmit of P4)
receive dupack for P4 (due to spurious CA_Loss retransmit of P1)
receive dupack for P4 (due to spurious CA_Loss retransmit of P2)
[with your patch, at this point we risk spuriously entering
 fast recovery because we have  received 3 duplicate ACKs for P4]

A packetdrill test that shows that this is not the behavior of your
proposed patch would help support your proposed patch (presuming > is
replaced by after()).

best,
neal

> >> This patch enhance the TCP congestion control algorithm for lack
> >> of SACK.
> >
> > You describe this as an enhancement. Can you please elaborate on the
> > drawback/downside of staying in CA_Loss in this case you are
> > describing (where you used kprobes to find that TCP stayed in CA_Loss
> > state when high_seq was equal to snd_nxt)?
> >
> I tried, but I can't reproduce it by packetdrill. This problem appeared
> in our production environment. Here is part of the trace message:
>
> First ack:
> #tcp_ack: (tcp_ack+0x0/0x920) skb_tcp_seq=0x1dc21196 skb_tcp_ack_seq=0x9d5e4bcc(3427491485)
>         packets_out=4 retrans_out=1 sacked_out=0 lost_out=4 snd_nxt=3427491485
>         snd_una=3427485917 high_seq=3427491485 reordering=1 mss_cache=1392
>         icsk_ca_state=4 sack_ok=0 undo_retrans=1 snd_cwnd=1
>
> #tcp_fastretrans_alert: (tcp_fastretrans_alert+0x0/0x7b0) prior_snd_una=3427485917
>         num_dupack=0 packets_out=0 retrans_out=0 sacked_out=0 lost_out=0
>         snd_nxt=3427491485 snd_una=3427491485 high_seq=3427491485 reordering=1
>         mss_cache=1392 icsk_ca_state=4 sack_ok=0 undo_retrans=1 snd_cwnd=1
>
> As we can see by func tcp_fastretrans_alert icsk_ca_state remains CA_Loss (4),
> and the numbers: snd_nxt, snd_una and high_seq are the same.
>
> first dup ack:
> #tcp_ack: (tcp_ack+0x0/0x920) skb_tcp_seq=0x1dc21196 skb_tcp_ack_seq=0x9d5e4bcc(3427491485)
>         packets_out=2 retrans_out=0 sacked_out=0 lost_out=0 snd_nxt=3427494269
>         snd_una=3427491485 high_seq=3427491485 reordering=1 mss_cache=1392
>         icsk_ca_state=4 sack_ok=0 undo_retrans=1 snd_cwnd=2
>
> #tcp_fastretrans_alert: (tcp_fastretrans_alert+0x0/0x7b0) num_dupack=1 packets_out=2
>         retrans_out=0 sacked_out=0 lost_out=0 snd_nxt=3427494269 snd_una=3427491485
>         high_seq=3427491485 reordering=1 icsk_ca_state=4 sack_ok=0 undo_retrans=1 snd_cwnd=2
>
> second dup ack:
> #tcp_ack: (tcp_ack+0x0/0x920) skb_tcp_seq=0x1dc21196 skb_tcp_ack_seq=0x9d5e4bcc(3427491485)
>         packets_out=4 retrans_out=0 sacked_out=0 lost_out=0 snd_nxt=3427497053
>         snd_una=3427491485 high_seq=3427491485 reordering=1 mss_cache=1392
>         icsk_ca_state=4 sack_ok=0 undo_retrans=1 snd_cwnd=4
>
> So, I really hope someone can answer whether my understanding is correct.
>
> > To deal with sequence number wrap-around, sequence number comparisons
> > in TCP need to use the before() and after() helpers, rather than
> > comparison operators. Here it seems the patch should use after()
> > rather than >. However,  I think the larger concern is the concern
> > mentioned above.
> >
> If this patch is useful, I will modify this.
>
> Regards Junwei
>