[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <18cd7e48-d719-bc82-9dbc-67cbb42eed83@gmail.com>
Date: Wed, 6 Apr 2022 10:40:18 -0700
From: Eric Dumazet <eric.dumazet@...il.com>
To: Erin MacNeil <emacneil@...iper.net>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: TCP stack gets into state of continually advertising “silly window” size of 1
On 4/6/22 07:19, Erin MacNeil wrote:
> This issue has been observed with the 4.8.28 kernel, I am wondering if it may be a known issue with an available fix?
>
> Description:
> Device A hosting IP address <Device A i/f addr> is running Linux version: 4.8.28, and device B hosting IP address <Device B i/f addr> is non-Linux based.
> Both devices are configured with an interface MTU of 9114 bytes.
>
> The TCP connection gets established via frames 1418-1419, where a window size + MSS of 9060 is agreed upon; SACK is disabled as device B does not support it + window scaling is not in play.
>
> No. Time Source Destination Protocol Length Info
> *1418 2022-03-15 06:52:49.693168 <Device A i/f addr> <Device B i/f addr> TCP 122 57486 -> 179 [SYN] Seq=0 Win=9060 Len=0 MSS=9060 SACK_PERM=1 TSval=3368771415 TSecr=0 WS=1
> *1419 2022-03-15 06:52:49.709325 <Device B i/f addr> <Device A i/f addr> TCP 132 179 -> 57486 [SYN, ACK] Seq=0 Ack=1 Win=16384 Len=0 MSS=9060 WS=1
> ...
> 4661 2022-03-15 06:53:52.437668 <Device B i/f addr> <Device A i/f addr> BGP 9184
> 4662 2022-03-15 06:53:52.437747 <Device A i/f addr> <Device B i/f addr> TCP 102 57486 -> 179 [ACK] Seq=3177223 Ack=9658065 Win=9060 Len=0
> 4663 2022-03-15 06:53:52.454599 <Device B i/f addr> <Device A i/f addr> BGP 9184
> 4664 2022-03-15 06:53:52.454661 <Device A i/f addr> <Device B i/f addr> TCP 102 57486 -> 179 [ACK] Seq=3177223 Ack=9667125 Win=9060 Len=0
> 4665 2022-03-15 06:53:52.471377 <Device B i/f addr> <Device A i/f addr> BGP 9184
> 4666 2022-03-15 06:53:52.512396 <Device A i/f addr> <Device B i/f addr> TCP 102 57486 -> 179 [ACK] Seq=3177223 Ack=9676185 Win=0 Len=0
> 4667 2022-03-15 06:53:52.828918 <Device A i/f addr> <Device B i/f addr> TCP 102 57486 -> 179 [ACK] Seq=3177223 Ack=9676185 Win=9060 Len=0
> 4668 2022-03-15 06:53:52.829001 <Device B i/f addr> <Device A i/f addr> BGP 125
> 4669 2022-03-15 06:53:52.829032 <Device A i/f addr> <Device B i/f addr> TCP 102 57486 -> 179 [ACK] Seq=3177223 Ack=9676186 Win=9060 Len=0
> 4670 2022-03-15 06:53:52.845494 <Device B i/f addr> <Device A i/f addr> BGP 9184
> *4671 2022-03-15 06:53:52.845532 <Device A i/f addr> <Device B i/f addr> TCP 102 57486 -> 179 [ACK] Seq=3177223 Ack=9685245 Win=1 Len=0
> 4672 2022-03-15 06:53:52.861968 <Device B i/f addr> <Device A i/f addr> TCP 125 179 -> 57486 [ACK] Seq=9685245 Ack=3177223 Win=27803 Len=1
> ...
> At frame 4671, some 63 seconds after the connection has been established, device A advertises a window size of 1, and the connection never recovers from this; a window size of 1 is continually advertised. The issue seems to be triggered by device B sending a TCP window probe conveying a single byte of data (the next byte in its send window) in frame 4668; when this is ACKed by device A, device A also re-advertises its receive window as 9060. The next packet from device B, frame 4670, conveys 9060 bytes of data, the first byte of which is the same byte that it sent in frame 4668 which device A has already ACKed, but which device B may not yet have seen.
>
> On device A, the TCP socket was configured with setsockopt() SO_RCVBUF & SO_SNDBUF values of 16k.
Presumably 16k buffers while MTU is 9000 is not correct.
Kernel has some logic to ensure a minimal value, based on standard MTU
sizes.
Have you tried not using setsockopt() SO_RCVBUF & SO_SNDBUF ?
>
> Here is the sequence detail:
>
> |2022-03-15 06:53:52.437668| ACK - Len: 9060 |Seq = 4236355144 Ack = 502383504 | |(57486) <------------------ (179) |
> |2022-03-15 06:53:52.437747| ACK | |Seq = 502383551 Ack = 4236364204 | |(57486) ------------------> (179) |
> |2022-03-15 06:53:52.454599| ACK - Len: 9060 |Seq = 4236364204 Ack = 502383551 | |(57486) <------------------ (179) |
> |2022-03-15 06:53:52.454661| ACK | |Seq = 502383551 Ack = 4236373264 | |(57486) ------------------> (179) |
> |2022-03-15 06:53:52.471377| ACK - Len: 9060 |Seq = 4236373264 Ack = 502383551 | |(57486) <------------------ (179) |
> |2022-03-15 06:53:52.512396| ACK | |Seq = 502383551 Ack = 4236382324 | |(57486) ------------------> (179) |
> |2022-03-15 06:53:52.828918| ACK | |Seq = 502383551 Ack = 4236382324 | |(57486) ------------------> (179) |
> |2022-03-15 06:53:52.829001| ACK - Len: 1 |Seq = 4236382324 Ack = 502383551 | |(57486) <------------------ (179) |
> |2022-03-15 06:53:52.829032| ACK | |Seq = 502383551 Ack = 4236382325 | |(57486) ------------------> (179) |
> |2022-03-15 06:53:52.845494| ACK - Len: 9060 |Seq = 4236382324 Ack = 502383551 | |(57486) <------------------ (179) |
> |2022-03-15 06:53:52.845532| ACK | |Seq = 502383551 Ack = 4236391384 | |(57486) ------------------> (179) |
> |2022-03-15 06:53:52.861968| ACK - Len: 1 |Seq = 4236391384 Ack = 502383551 | |(57486) <------------------ (179) |
> |2022-03-15 06:53:52.862022| ACK | |Seq = 502383551 Ack = 4236391385 | |(57486) ------------------> (179) |
> |2022-03-15 06:53:52.878445| ACK - Len: 1 |Seq = 4236391385 Ack = 502383551 | |(57486) <------------------ (179) |
> |2022-03-15 06:53:52.878529| ACK | |Seq = 502383551 Ack = 4236391386 | |(57486) ------------------> (179) |
> |2022-03-15 06:53:52.895212| ACK - Len: 1 |Seq = 4236391386 Ack = 502383551 | |(57486) <------------------ (179) |
>
>
> There is no data in the recv-q or send-q at this point, yet the window stays at size 1:
>
> $ ss -o state established -ntepi '( dport = 179 or sport = 179 )' dst <Device B i/f addr>
> Recv-Q Send-Q Local Address:Port Peer Address:Port
> 0 0 <Device A i/f addr>:57486 <Device B i/f addr>:179 ino:1170981660 sk:d9d <->
>
>
> Thanks
> -Erin
>
> --
>
> Juniper Business Use Only
Powered by blists - more mailing lists