lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAHxn9_waCMAh3Me63WQv+1h=FmT10grA13t09xaym4hX1KgCg@mail.gmail.com>
Date: Thu, 22 May 2025 12:34:18 +0200
From: Simon Campion <simon.campion@...pl.com>
To: Neal Cardwell <ncardwell@...gle.com>
Cc: netdev@...r.kernel.org, Eric Dumazet <edumazet@...gle.com>, 
	Yuchung Cheng <ycheng@...gle.com>, Kevin Yang <yyd@...gle.com>, Jon Maloy <jmaloy@...hat.com>
Subject: Re: Re: [EXT] Re: tcp: socket stuck with zero receive window after SACK

On Wed, 21 May 2025 at 17:56, Neal Cardwell <ncardwell@...gle.com> wrote:
> For my education, why do you set net.ipv4.tcp_shrink_window=1?

We enabled it mainly as an attempt to decrease the frequency of a
different issue in which jumbo frames were dropped indefinitely on a
host, presumably after memory pressure, discussed in [1]. The jumbo
frame issue is most likely triggered by system-wide memory pressure
rather than hitting net.ipv4.tcp_mem. So,
net.ipv4.tcp_shrink_window=1, which, as far as we understand, makes
hitting net.ipv4.tcp_mem less likely, probably didn't help with
decreasing the frequency of the jumbo frame issue. But the issue had
sufficiently serious impact and we were sufficiently unsure about the
root cause that we deemed net.ipv4.tcp_shrink_window=1 worth a try.
(Also, the rationale behind net.ipv4.tcp_shrink_window=1 laid out in
[2] and [3] sounded reasonable.)

But yes, it's feasible for us to revert to the default
net.ipv4.tcp_shrink_window=0, in particular because there's another
workaround for the jumbo frame issue: reduce the MTU. We've set
net.ipv4.tcp_shrink_window=0 yesterday and haven't seen the issue
since. So:

6.6.74 + net.ipv4.tcp_shrink_window=1: issue occurs
6.6.83 + net.ipv4.tcp_shrink_window=1: issue occurs
6.6.74 + net.ipv4.tcp_shrink_window=0: no issue so far
6.6.83 + net.ipv4.tcp_shrink_window=0: no issue so far

Since the issue occurred sporadically, it's too soon to be fully
confident that it's gone with net.ipv4.tcp_shrink_window=0. We'll
write again in a week or so to confirm.

If net.ipv4.tcp_shrink_window=1 seems to have caused this issue, we'd
still be curious to understand why it leads to TCP connections being
stuck indefinitely even though the recv-q (as reported by ss) is 0.
Assuming the recv-q was indeed correctly reported as 0, the issue
might be that receive buffers can fill up in a way so that the only
way for data to leave the receive buffer is receipt of further data.
In particular, the application can't read data out of the receive
buffer and empty it that way. Maybe filling up buffers with data
received out-of-order (whether we SACK it or not) satisfies this
condition. This would explain why we saw this issue only in the
presence of SACK flags before we disabled SACK. With
net.ipv4.tcp_shrink_window=1, a full receive buffer leads to a zero
window being advertised (see [2]) and if the buffer filled up in a way
so that no data can leave until further data is received, we are stuck
forever because the kernel drops incoming data due to the zero window.
In contrast, with ipv4.tcp_shrink_window=0, we will keep advertising a
non-zero window, so incoming data isn't dropped and we can have data
leave the receive buffer. I'm speculating here; once we confirm that
the issue seems to have been triggered by
net.ipv4.tcp_shrink_window=1, I'd be keen to hear other thoughts as to
why the setting may have this effect in certain environments.

[1] https://marc.info/?l=linux-netdev&m=174600337131981&w=2
[2] https://github.com/torvalds/linux/commit/b650d953cd391595e536153ce30b4aab385643ac
[3] https://blog.cloudflare.com/unbounded-memory-usage-by-tcp-for-receive-buffers-and-how-we-fixed-it/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ