lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAHxn9_++G0icFE1F+NCfnj3AkErmytQ3LUz2C-oY-TJKbdwmg@mail.gmail.com>
Date: Wed, 21 May 2025 17:08:21 +0200
From: Simon Campion <simon.campion@...pl.com>
To: Neal Cardwell <ncardwell@...gle.com>
Cc: netdev@...r.kernel.org, Eric Dumazet <edumazet@...gle.com>, 
	Yuchung Cheng <ycheng@...gle.com>, Kevin Yang <yyd@...gle.com>, Jon Maloy <jmaloy@...hat.com>
Subject: Re: Re: [EXT] Re: tcp: socket stuck with zero receive window after SACK

Great to hear we have a potential lead to investigate!

We've now seen this problem occur several times on multiple different
nodes. We tried two workarounds, without success:
* As far as we see, the patch Neal mentioned was included in the
6.6.76 release. We rolled back some nodes to an earlier Flatcar image
with kernel 6.6.74. But we saw the issue occur on 6.6.74 as well.
* We disabled SACK on the nodes with broken connections (not on the
nodes they connect to). The problem occurs in the absence of SACK as
well:
05:59:05.706056 eth1b Out IP 10.70.3.80.57136 > 10.70.3.46.6920: Flags
[P.], seq 306:315, ack 1, win 0, options [nop,nop,TS val 2554169028
ecr 1041911222], length 9
05:59:05.706142 eth1b In  IP 10.70.3.46.6920 > 10.70.3.80.57136: Flags
[.], ack 315, win 501, options [nop,nop,TS val 1041916342 ecr
2554169028], length 0
05:59:07.846543 eth1b In  IP 10.70.3.46.6920 > 10.70.3.80.57136: Flags
[.], seq 1:609, ack 315, win 501, options [nop,nop,TS val 1041918483
ecr 2554169028], length 608
05:59:07.846569 eth1b Out IP 10.70.3.80.57136 > 10.70.3.46.6920: Flags
[.], ack 1, win 0, options [nop,nop,TS val 2554171168 ecr 1041918483],
length 0
05:59:10.826079 eth1b Out IP 10.70.3.80.57136 > 10.70.3.46.6920: Flags
[P.], seq 315:324, ack 1, win 0, options [nop,nop,TS val 2554174148
ecr 1041918483], length 9
05:59:10.826205 eth1b In  IP 10.70.3.46.6920 > 10.70.3.80.57136: Flags
[.], ack 324, win 501, options [nop,nop,TS val 1041921462 ecr
2554174148], length 0

Another important piece of information (which I should've included in
my first message!): we set net.ipv4.tcp_shrink_window=1. We disabled
it to check whether this will avoid the issue.

Thanks for all your help!
Simon

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ