netdev - Re: [PATCH] tcp: del skb from tsorted_sent

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CADVnQynrEer3EBcDe2jeK4GNFOdKMFLwFgiXqjFg5CgAiBOjFA@mail.gmail.com>
Date:   Wed, 31 Aug 2022 08:46:20 -0400
From:   Neal Cardwell <ncardwell@...gle.com>
To:     Yonglong Li <liyonglong@...natelecom.cn>
Cc:     Yuchung Cheng <ycheng@...gle.com>, netdev@...r.kernel.org,
        davem@...emloft.net, dsahern@...nel.org, edumazet@...gle.com,
        kuba@...nel.org, pabeni@...hat.com
Subject: Re: [PATCH] tcp: del skb from tsorted_sent_queue after mark it as lost

On Wed, Aug 31, 2022 at 3:19 AM Yonglong Li <liyonglong@...natelecom.cn> wrote:
>
>
>
> On 8/31/2022 1:58 PM, Yuchung Cheng wrote:
> > On Mon, Aug 29, 2022 at 5:23 PM Yuchung Cheng <ycheng@...gle.com> wrote:
> >>
> >> On Mon, Aug 29, 2022 at 1:21 AM Yonglong Li <liyonglong@...natelecom.cn> wrote:
> >>>
> >>> if rack is enabled, when skb marked as lost we can remove it from
> >>> tsorted_sent_queue. It will reduces the iterations on tsorted_sent_queue
> >>> in tcp_rack_detect_loss
> >>
> >> Did you test the case where an skb is marked lost again after
> >> retransmission? I can't quite remember the reason I avoided this
> >> optimization. let me run some test and get back to you.
> > As I suspected, this patch fails to pass our packet drill tests.
> >
> > It breaks detecting retransmitted packets that
> > get lost again, b/c they have already been removed from the tsorted
> > list when they get lost the first time.
> >
> >
>
> Hi Yuchung,
> Thank you for your feelback.
> But I am not quite understand. in the current implementation, if an skb
> is marked lost again after retransmission, it will be added to tail of
> tsorted_sent_queue again in tcp_update_skb_after_send.
> Do I miss some code?

That's correct, but in the kind of scenario Yuchung is talking about,
the skb is not retransmitted again.

To clarify, here is an example snippet of a test written by Yuchung
that covers this kind of case:

----
`../common/defaults.sh`

    0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
   +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
   +0 bind(3, ..., ...) = 0
   +0 listen(3, 1) = 0

   +0 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
   +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8>
 +.02 < . 1:1(0) ack 1 win 257
   +0 accept(3, ..., ...) = 4
   +0 write(4, ..., 16000) = 16000
   +0 > P. 1:10001(10000) ack 1

// TLP (but it is dropped too so no ack for it)
 +.04 > . 10001:11001(1000) ack 1

// RTO and retransmit head
 +.22 > . 1:1001(1000) ack 1

// ACK was lost. But the (spurious) retransmit induced a DSACK.
// So total this ack hints two packets (original & dup).
// Undo cwnd and ssthresh.
 +.01 < . 1:1(0) ack 1001 win 257 <sack 1:1001,nop,nop>
   +0 > P. 11001:13001(2000) ack 1
   +0 %{
assert tcpi_snd_cwnd == 12, tcpi_snd_cwnd
assert tcpi_snd_ssthresh > 1000000, tcpi_snd_ssthresh
}%

// TLP to discover the real losses 1001:11001(10000)
 +.04 > . 13001:14001(1000) ack 1

// Fast recovery. PRR first then PRR-SS after retransmits are acked
 +.01 < . 1:1(0) ack 1001 win 257 <sack 11001:12001,nop,nop>
   +0 > . 1001:2001(1000) ack 1
----

In this test case, with the proposed patch in this thread applied, the
final 1001:2001(1000) skb is transmitted 440ms later, after an RTO.
AFAICT that's because the 1001:2001(1000) skb was removed from the
tsorted list upon the original (spurious RTO) but not re-added upon
the undo of that spurious RTO.

best regards,
neal