netdev - Re: retrans_stamp not cleared while testing NewReno implementation.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CADVnQymOSFQSDczDa2VWF8XbbrHbQ1sFwoTjDvvdWh7+BP5Big@mail.gmail.com>
Date:   Fri, 2 Sep 2022 15:17:32 -0400
From:   Neal Cardwell <ncardwell@...gle.com>
To:     "Arankal, Nagaraj" <nagaraj.p.arankal@....com>
Cc:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Eric Dumazet <edumazet@...gle.com>,
        Yuchung Cheng <ycheng@...gle.com>
Subject: Re: retrans_stamp not cleared while testing NewReno implementation.

On Fri, Sep 2, 2022 at 10:19 AM Arankal, Nagaraj
<nagaraj.p.arankal@....com> wrote:
>
> Hi Neal,
> Thanks a lot.
> I shall update my Kernel with proposed fix and run the tests again.

Great. Thanks for testing the patch!

I cooked a packetdrill test case, based on your scenario, to reproduce
the problem, and was able to reproduce the problem. I have pasted that
below. And then below that I have pasted a packetdrill test showing
the fixed behavior after the proposed fix patch is applied.

We will post an official patch on the list for review/discussion.

### Here is a version showing the buggy behavior on an unpatched
kernel. The TCP sender only sends 2 RTO retransmissions before timing
out the connection, even though we requested 5 retries
(net.ipv4.tcp_retries2=5):

// Reproduce a scenario reported in the netdev thread:
//  "retrans_stamp not cleared while testing NewReno implementation."
// Ensure that retrans_stamp is cleared during TS undo of an RTO episode.

--tcp_ts_tick_usecs=1000

// Set tcp_retries2 to a low value so that we time out sockets quickly.
`../common/defaults.sh
sysctl -q net.ipv4.tcp_retries2=5`

    0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
   +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
   +0 bind(3, ..., ...) = 0
   +0 listen(3, 1) = 0

   +0 < S 0:0(0) win 32792 <mss 1012,nop,nop,TS val 100 ecr 0,nop,wscale 7>
   +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,TS val 0 ecr 100,nop,wscale 8>
+.020 < . 1:1(0) ack 1 win 257 <nop,nop,TS val 120 ecr 0>
   +0 accept(3, ..., ...) = 4
   +0 write(4, ..., 1000) = 1000
   +0 > P. 1:1001(1000) ack 1 <nop,nop,TS val 20 ecr 100>
+0 %{ assert tcpi_snd_cwnd == 10, tcpi_snd_cwnd }%

// RTO and retransmit head spuriously.
+.220 > P. 1:1001(1000) ack 1 <nop,nop,TS val 220 ecr 100>
+0 %{ assert tcpi_snd_cwnd == 1, tcpi_snd_cwnd }%
+0 %{ assert tcpi_ca_state == TCP_CA_Loss }%

// ACK arrives with an ECR indicating it's ACKing the original skb,
// so we undo the loss recovery. However, since this is a non-SACK
// connection we remain in CA_Loss.
+.005 < . 1:1(0) ack 1001 win 257 <nop,nop,TS val 140 ecr 20>
+0 %{ assert tcpi_snd_cwnd == 10, tcpi_snd_cwnd }%
+0 %{ assert tcpi_ca_state == TCP_CA_Loss }%


// Much later we send something.
+11 write(4, ..., 1000) = 1000
   +0 > P. 1001:2001(1000) ack 1 <nop,nop,TS val 11253 ecr 140>
+0 %{ assert tcpi_snd_cwnd == 10, tcpi_snd_cwnd }%

// RTO and retransmit head.
+.290 > P. 1001:2001(1000) ack 1 <nop,nop,TS val 11540 ecr 140>

// RTO and retransmit head.
+.618 > P. 1001:2001(1000) ack 1 <nop,nop,TS val 12148 ecr 140>

// Check whether connection is timed out yet (it should not be):
+1.30 write(4, ..., 1000) = -1 ETIMEDOUT (Connection Timed Out)

### Here is a version showing the fixed behavior on a patched kernel.
The TCP sender correctly sends 5 RTO retransmissions before timing out
the connection, since we requested 5 retries
(net.ipv4.tcp_retries2=5):

// Reproduce a scenario reported in the netdev thread:
//  "retrans_stamp not cleared while testing NewReno implementation."
// Ensure that retrans_stamp is cleared during TS undo of an RTO episode.

--tcp_ts_tick_usecs=1000

// Set tcp_retries2 to 5 so that we should get exactly 5
// RTO retransmissions before the connection times out
// and returns ETIMEDOUT.
`../common/defaults.sh
sysctl -q net.ipv4.tcp_retries2=5`

    0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
   +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
   +0 bind(3, ..., ...) = 0
   +0 listen(3, 1) = 0

   +0 < S 0:0(0) win 32792 <mss 1012,nop,nop,TS val 100 ecr 0,nop,wscale 7>
   +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,TS val 0 ecr 100,nop,wscale 8>
+.020 < . 1:1(0) ack 1 win 257 <nop,nop,TS val 120 ecr 0>
   +0 accept(3, ..., ...) = 4
   +0 write(4, ..., 1000) = 1000
   +0 > P. 1:1001(1000) ack 1 <nop,nop,TS val 20 ecr 100>
+0 %{ assert tcpi_snd_cwnd == 10, tcpi_snd_cwnd }%

// RTO and retransmit head spuriously.
+.220 > P. 1:1001(1000) ack 1 <nop,nop,TS val 220 ecr 100>
+0 %{ assert tcpi_snd_cwnd == 1, tcpi_snd_cwnd }%
+0 %{ assert tcpi_ca_state == TCP_CA_Loss }%

// ACK arrives with an ECR indicating it's ACKing the original skb,
// so we undo the loss recovery. However, since this is a non-SACK
// connection we remain in CA_Loss.
+.005 < . 1:1(0) ack 1001 win 257 <nop,nop,TS val 140 ecr 20>
+0 %{ assert tcpi_snd_cwnd == 10, tcpi_snd_cwnd }%
+0 %{ assert tcpi_ca_state == TCP_CA_Loss }%

// Much later we send something.
+11 write(4, ..., 1000) = 1000
   +0 > P. 1001:2001(1000) ack 1 <nop,nop,TS val 11253 ecr 140>
+0 %{ assert tcpi_snd_cwnd == 10, tcpi_snd_cwnd }%

// RTO and retransmit head.
+.290 > P. 1001:2001(1000) ack 1 <nop,nop,TS val 11540 ecr 140>

// RTO and retransmit head.
+.618 > P. 1001:2001(1000) ack 1 <nop,nop,TS val 12148 ecr 140>

// RTO and retransmit head.
+1.216 > P. 1001:2001(1000) ack 1 <nop,nop,TS val 12148 ecr 140>

// RTO and retransmit head.
+2.432 > P. 1001:2001(1000) ack 1 <nop,nop,TS val 15768 ecr 140>

// RTO and retransmit head.
+4.864 > P. 1001:2001(1000) ack 1 <nop,nop,TS val 20642 ecr 140>

// Check whether connection is timed out yet (it should not be):
+9.8 write(4, ..., 1000) = -1 ETIMEDOUT (Connection Timed Out)