[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iL7XCLBxsUnV3c_5AD8eSJ=jXs6o_KJUjmZAGo6_6sqUg@mail.gmail.com>
Date: Fri, 26 Feb 2021 11:41:22 +0100
From: Eric Dumazet <edumazet@...gle.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
Soheil Hassas Yeganeh <soheil@...gle.com>,
Neal Cardwell <ncardwell@...gle.com>,
Yuchung Cheng <ycheng@...gle.com>
Subject: Re: Spurious TCP retransmissions on ack vs kfree_skb reordering
On Fri, Feb 26, 2021 at 11:05 AM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Fri, Feb 26, 2021 at 4:15 AM Jakub Kicinski <kuba@...nel.org> wrote:
> >
> > On Thu, 25 Feb 2021 15:25:15 -0800 Jakub Kicinski wrote:
> > > Hi!
> > >
> > > We see large (4-8x) increase of what looks like TCP RTOs after rising
> > > the Tx coalescing above Rx coalescing timeout.
> > >
> > > Quick tracing of the events seems to indicate that the data has already
> > > been acked when we enter tcp:tcp_retransmit_skb:
> >
> > Seems like I'm pretty lost here and the tcp:tcp_retransmit_skb events
> > are less spurious than I thought. Looking at some tcpdump traces we see:
> >
> > 0.045277 IP6 A > B: Flags [SEW], seq 2248382925:2248383296, win 61920, options [mss 1440,sackOK,TS val 658870494 ecr 0,nop,wscale 11], length 371
> >
> > 0.045348 IP6 B > A: Flags [S.E], seq 961169456, ack 2248382926, win 65535, options [mss 1440,sackOK,TS val 883864022 ecr 658870494,nop,wscale 9], length 0
>
> The SYNACK does not include the prior payload.
>
> > 0.045369 IP6 A > B: Flags [P.], seq 1:372, ack 1, win 31, options [nop,nop,TS val 658870494 ecr 883864022], length 371
>
> So this rtx is not spurious.
>
> However in your prior email you wrote :
>
> bytes_in: 0
> bytes_out: 742
> bytes_acked: 742
>
> Are you sure that at the time of the retransmit, bytes_acked was 742 ?
> I do not see how this could happen.
Yes, this packetdrill test confirms TCP INFO stuff seems correct .
//
// SYN-data are not fully acknowledged
//
`../../common/defaults.sh
sysctl -wq net.ipv4.tcp_timestamps=0 `
// Cache warmup: send a Fast Open cookie request
0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
+0 sendto(3, ..., 0, MSG_FASTOPEN, ..., ...) = -1 EINPROGRESS
(Operation is now in progress)
+0 > S 0:0(0) <mss 1460,nop,nop,sackOK,nop,wscale 8,FO,nop,nop>
+.01 < S. 123:123(0) ack 1 win 5840 <mss
1460,nop,nop,sackOK,nop,wscale 6,FO 1234abcd,nop,nop>
+0 %{ assert tcpi_delivered == 1, tcpi_delivered }%
+0 > . 1:1(0) ack 1
+.01 close(3) = 0
+0 > F. 1:1(0) ack 1
+.01 < F. 1:1(0) ack 2 win 92
+0 > . 2:2(0) ack 2
//
// Test: the data in SYN-data are retransmitted when server acks only ISN
//
+0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 4
+0 fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
+0 sendto(4, ..., 700, MSG_FASTOPEN, ..., ...) = 700
+0 > S 0:700(700) <mss 1460,nop,nop,sackOK,nop,wscale 8,FO 1234abcd,nop,nop>
+.02 < S. 0:0(0) ack 1 win 5840 <mss 1040,nop,nop,sackOK,nop,wscale
6,FO 1234abcd,nop,nop>
// Retransmits data in SYN-data on the first ACK.
+0 %{ assert tcpi_delivered == 1, tcpi_delivered }%
+0 %{ assert tcpi_bytes_acked == 1, tcpi_bytes_acked }%
+0 %{ assert tcpi_bytes_sent == 2*700, tcpi_bytes_sent }%
+0 > P. 1:701(700) ack 1
+0 < P. 1:1001(1000) ack 701 win 257
+0 %{ assert tcpi_delivered == 2, tcpi_delivered }%
+0 %{ assert tcpi_bytes_acked == 701, tcpi_bytes_acked }%
+0 > . 701:701(0) ack 1001
+0 read(4, ..., 1024) = 1000
+0 %{ assert (tcpi_options & TCPI_OPT_SYN_DATA) == 0, tcpi_options }%
+0 close(4) = 0
+0 > F. 701:701(0) ack 1001
+.02 < F. 1001:1001(0) ack 702 win 257
+0 > . 702:702(0) ack 1002
Powered by blists - more mailing lists