[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iKPdAVdPo1g15dEp3smAjM2rY0T25p3y2Dzu-poFk5kWA@mail.gmail.com>
Date: Fri, 3 Nov 2023 10:53:22 +0100
From: Eric Dumazet <edumazet@...gle.com>
To: Michal Kubecek <mkubecek@...e.cz>
Cc: Jiri Slaby <jirislaby@...nel.org>, "David S . Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, netdev@...r.kernel.org,
Soheil Hassas Yeganeh <soheil@...gle.com>, Neal Cardwell <ncardwell@...gle.com>,
Yuchung Cheng <ycheng@...gle.com>, eric.dumazet@...il.com
Subject: Re: [PATCH net-next] tcp: get rid of sysctl_tcp_adv_win_scale
On Fri, Nov 3, 2023 at 10:27 AM Michal Kubecek <mkubecek@...e.cz> wrote:
>
> On Fri, Nov 03, 2023 at 09:17:27AM +0100, Eric Dumazet wrote:
> >
> > It seems the test had some expectations.
> >
> > Setting a small (1 byte) RCVBUF/SNDBUF, and yet expecting to send
> > 46080 bytes fast enough was not reasonable.
> > It might have relied on the fact that tcp sendmsg() can cook large GSO
> > packets, even if sk->sk_sndbuf is small.
> >
> > With tight memory settings, it is possible TCP has to resort on RTO
> > timers (200ms by default) to recover from dropped packets.
>
> There seems to be one drop but somehow the sender does not recover from
> it, even if the retransmit and following packets are acked quickly:
>
> 09:15:29.424017 IP 127.0.0.1.40386 > 127.0.0.1.42483: Flags [S], seq 104649613, win 33280, options [mss 65495,sackOK,TS val 1319295278 ecr 0,nop,wscale 7], length 0
> 09:15:29.424024 IP 127.0.0.1.42483 > 127.0.0.1.40386: Flags [S.], seq 1343383818, ack 104649614, win 585, options [mss 65495,sackOK,TS val 1319295278 ecr 1319295278,nop,wscale 0], length 0
> 09:15:29.424031 IP 127.0.0.1.40386 > 127.0.0.1.42483: Flags [.], ack 1, win 260, options [nop,nop,TS val 1319295278 ecr 1319295278], length 0
> 09:15:29.424155 IP 127.0.0.1.42483 > 127.0.0.1.40386: Flags [.], seq 1:16641, ack 1, win 585, options [nop,nop,TS val 1319295279 ecr 1319295278], length 16640
> 09:15:29.424160 IP 127.0.0.1.40386 > 127.0.0.1.42483: Flags [.], ack 16641, win 130, options [nop,nop,TS val 1319295279 ecr 1319295279], length 0
> 09:15:29.424179 IP 127.0.0.1.42483 > 127.0.0.1.40386: Flags [P.], seq 16641:33281, ack 1, win 585, options [nop,nop,TS val 1319295279 ecr 1319295279], length 16640
> 09:15:29.424183 IP 127.0.0.1.40386 > 127.0.0.1.42483: Flags [.], ack 16641, win 0, options [nop,nop,TS val 1319295279 ecr 1319295279], length 0
> 09:15:29.424280 IP 127.0.0.1.40386 > 127.0.0.1.42483: Flags [P.], seq 1:12, ack 16641, win 16640, options [nop,nop,TS val 1319295279 ecr 1319295279], length 11
> 09:15:29.424284 IP 127.0.0.1.42483 > 127.0.0.1.40386: Flags [.], ack 12, win 574, options [nop,nop,TS val 1319295279 ecr 1319295279], length 0
> 09:15:29.630272 IP 127.0.0.1.42483 > 127.0.0.1.40386: Flags [P.], seq 16641:33281, ack 12, win 574, options [nop,nop,TS val 1319295485 ecr 1319295279], length 16640
> 09:15:29.630334 IP 127.0.0.1.40386 > 127.0.0.1.42483: Flags [.], ack 33281, win 2304, options [nop,nop,TS val 1319295485 ecr 1319295485], length 0
> 09:15:29.836938 IP 127.0.0.1.42483 > 127.0.0.1.40386: Flags [P.], seq 33281:35585, ack 12, win 574, options [nop,nop,TS val 1319295691 ecr 1319295485], length 2304
> 09:15:29.836984 IP 127.0.0.1.40386 > 127.0.0.1.42483: Flags [.], ack 35585, win 2304, options [nop,nop,TS val 1319295691 ecr 1319295691], length 0
> 09:15:30.043606 IP 127.0.0.1.42483 > 127.0.0.1.40386: Flags [P.], seq 35585:37889, ack 12, win 574, options [nop,nop,TS val 1319295898 ecr 1319295691], length 2304
> 09:15:30.043653 IP 127.0.0.1.40386 > 127.0.0.1.42483: Flags [.], ack 37889, win 2304, options [nop,nop,TS val 1319295898 ecr 1319295898], length 0
> 09:15:30.250270 IP 127.0.0.1.42483 > 127.0.0.1.40386: Flags [P.], seq 37889:40193, ack 12, win 574, options [nop,nop,TS val 1319296105 ecr 1319295898], length 2304
> 09:15:30.250316 IP 127.0.0.1.40386 > 127.0.0.1.42483: Flags [.], ack 40193, win 2304, options [nop,nop,TS val 1319296105 ecr 1319296105], length 0
> 09:15:30.456932 IP 127.0.0.1.42483 > 127.0.0.1.40386: Flags [P.], seq 40193:42497, ack 12, win 574, options [nop,nop,TS val 1319296311 ecr 1319296105], length 2304
> 09:15:30.456975 IP 127.0.0.1.40386 > 127.0.0.1.42483: Flags [.], ack 42497, win 2304, options [nop,nop,TS val 1319296311 ecr 1319296311], length 0
> 09:15:30.663598 IP 127.0.0.1.42483 > 127.0.0.1.40386: Flags [P.], seq 42497:44801, ack 12, win 574, options [nop,nop,TS val 1319296518 ecr 1319296311], length 2304
> 09:15:30.663638 IP 127.0.0.1.40386 > 127.0.0.1.42483: Flags [.], ack 44801, win 2304, options [nop,nop,TS val 1319296518 ecr 1319296518], length 0
> 09:15:30.663646 IP 127.0.0.1.42483 > 127.0.0.1.40386: Flags [FP.], seq 44801:46081, ack 12, win 574, options [nop,nop,TS val 1319296518 ecr 1319296518], length 1280
> 09:15:30.663712 IP 127.0.0.1.40386 > 127.0.0.1.42483: Flags [F.], seq 12, ack 46082, win 2304, options [nop,nop,TS val 1319296518 ecr 1319296518], length 0
> 09:15:30.663724 IP 127.0.0.1.42483 > 127.0.0.1.40386: Flags [.], ack 13, win 573, options [nop,nop,TS val 1319296518 ecr 1319296518], length 0
>
> (window size values are scaled here). Part of the problem is that the
> receiver side sets SO_RCVBUF after connect() so that the window shrinks
> after sender already sent more data; when I move the bufsized() calls
> in the python script before listen() and connect(), the test runs
> quickly.
This makes sense.
Old kernels would have instead dropped a packet, without changing test status:
09:49:49.390066 IP localhost.39710 > localhost.44173: Flags [S], seq
1464131415, win 65495, options [mss 65495,sackOK,TS val 578664891 ecr
0,nop,wscale 7], length 0
09:49:49.390078 IP localhost.44173 > localhost.39710: Flags [S.], seq
2322612108, ack 1464131416, win 1152, options [mss 65495,sackOK,TS val
578664891 ecr 578664891,nop,wscale 0], length 0
09:49:49.390088 IP localhost.39710 > localhost.44173: Flags [.], ack
1, win 512, options [nop,nop,TS val 578664891 ecr 578664891], length 0
09:49:49.390319 IP localhost.44173 > localhost.39710: Flags [.], seq
1:32769, ack 1, win 1152, options [nop,nop,TS val 578664892 ecr
578664891], length 32768
09:49:49.390325 IP localhost.39710 > localhost.44173: Flags [.], ack
32769, win 256, options [nop,nop,TS val 578664892 ecr 578664892],
length 0
09:49:49.390355 IP localhost.44173 > localhost.39710: Flags [P.], seq
32769:46081, ack 1, win 1152, options [nop,nop,TS val 578664892 ecr
578664892], length 13312
<prior packet has been dropped by receiver>
09:49:49.390479 IP localhost.39710 > localhost.44173: Flags [P.], seq
1:12, ack 32769, win 256, options [nop,nop,TS val 578664892 ecr
578664892], length 11
09:49:49.390483 IP localhost.44173 > localhost.39710: Flags [.], ack
12, win 1141, options [nop,nop,TS val 578664892 ecr 578664892], length
0
09:49:49.390547 IP localhost.44173 > localhost.39710: Flags [F.], seq
46081, ack 12, win 1141, options [nop,nop,TS val 578664892 ecr
578664892], length 0
09:49:49.390552 IP localhost.39710 > localhost.44173: Flags [.], ack
32769, win 256, options [nop,nop,TS val 578664892 ecr
578664892,nop,nop,sack 1 {46081:46082}], length 0
<packet retransmit>
09:49:49.390562 IP localhost.44173 > localhost.39710: Flags [P.], seq
32769:46081, ack 12, win 1141, options [nop,nop,TS val 578664892 ecr
578664892], length 13312
09:49:49.390567 IP localhost.39710 > localhost.44173: Flags [.], ack
46082, win 152, options [nop,nop,TS val 578664892 ecr 578664892],
length 0
09:49:49.390677 IP localhost.39710 > localhost.44173: Flags [F.], seq
12, ack 46082, win 152, options [nop,nop,TS val 578664892 ecr
578664892], length 0
09:49:49.390685 IP localhost.44173 > localhost.39710: Flags [.], ack
13, win 1141, options [nop,nop,TS val 578664892 ecr 578664892], length
0
Retracting TCP windows has always been problematic.
If we really want to be very gentle, this could add more logic,
shorter timer events for pathological cases like that,
I am not sure this is really worth it, especially if dealing with one
million TCP sockets in this state.
Powered by blists - more mailing lists