[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAHxn9_++G0icFE1F+NCfnj3AkErmytQ3LUz2C-oY-TJKbdwmg@mail.gmail.com>
Date: Wed, 21 May 2025 17:08:21 +0200
From: Simon Campion <simon.campion@...pl.com>
To: Neal Cardwell <ncardwell@...gle.com>
Cc: netdev@...r.kernel.org, Eric Dumazet <edumazet@...gle.com>,
Yuchung Cheng <ycheng@...gle.com>, Kevin Yang <yyd@...gle.com>, Jon Maloy <jmaloy@...hat.com>
Subject: Re: Re: [EXT] Re: tcp: socket stuck with zero receive window after SACK
Great to hear we have a potential lead to investigate!
We've now seen this problem occur several times on multiple different
nodes. We tried two workarounds, without success:
* As far as we see, the patch Neal mentioned was included in the
6.6.76 release. We rolled back some nodes to an earlier Flatcar image
with kernel 6.6.74. But we saw the issue occur on 6.6.74 as well.
* We disabled SACK on the nodes with broken connections (not on the
nodes they connect to). The problem occurs in the absence of SACK as
well:
05:59:05.706056 eth1b Out IP 10.70.3.80.57136 > 10.70.3.46.6920: Flags
[P.], seq 306:315, ack 1, win 0, options [nop,nop,TS val 2554169028
ecr 1041911222], length 9
05:59:05.706142 eth1b In IP 10.70.3.46.6920 > 10.70.3.80.57136: Flags
[.], ack 315, win 501, options [nop,nop,TS val 1041916342 ecr
2554169028], length 0
05:59:07.846543 eth1b In IP 10.70.3.46.6920 > 10.70.3.80.57136: Flags
[.], seq 1:609, ack 315, win 501, options [nop,nop,TS val 1041918483
ecr 2554169028], length 608
05:59:07.846569 eth1b Out IP 10.70.3.80.57136 > 10.70.3.46.6920: Flags
[.], ack 1, win 0, options [nop,nop,TS val 2554171168 ecr 1041918483],
length 0
05:59:10.826079 eth1b Out IP 10.70.3.80.57136 > 10.70.3.46.6920: Flags
[P.], seq 315:324, ack 1, win 0, options [nop,nop,TS val 2554174148
ecr 1041918483], length 9
05:59:10.826205 eth1b In IP 10.70.3.46.6920 > 10.70.3.80.57136: Flags
[.], ack 324, win 501, options [nop,nop,TS val 1041921462 ecr
2554174148], length 0
Another important piece of information (which I should've included in
my first message!): we set net.ipv4.tcp_shrink_window=1. We disabled
it to check whether this will avoid the issue.
Thanks for all your help!
Simon
Powered by blists - more mailing lists