netdev - Re: Re: [EXT] Re: tcp: socket stuck with zero receive window after SACK

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAAHxn9_waCMAh3Me63WQv+1h=FmT10grA13t09xaym4hX1KgCg@mail.gmail.com>
Date: Thu, 22 May 2025 12:34:18 +0200
From: Simon Campion <simon.campion@...pl.com>
To: Neal Cardwell <ncardwell@...gle.com>
Cc: netdev@...r.kernel.org, Eric Dumazet <edumazet@...gle.com>, 
	Yuchung Cheng <ycheng@...gle.com>, Kevin Yang <yyd@...gle.com>, Jon Maloy <jmaloy@...hat.com>
Subject: Re: Re: [EXT] Re: tcp: socket stuck with zero receive window after SACK

On Wed, 21 May 2025 at 17:56, Neal Cardwell <ncardwell@...gle.com> wrote:
> For my education, why do you set net.ipv4.tcp_shrink_window=1?

We enabled it mainly as an attempt to decrease the frequency of a
different issue in which jumbo frames were dropped indefinitely on a
host, presumably after memory pressure, discussed in [1]. The jumbo
frame issue is most likely triggered by system-wide memory pressure
rather than hitting net.ipv4.tcp_mem. So,
net.ipv4.tcp_shrink_window=1, which, as far as we understand, makes
hitting net.ipv4.tcp_mem less likely, probably didn't help with
decreasing the frequency of the jumbo frame issue. But the issue had
sufficiently serious impact and we were sufficiently unsure about the
root cause that we deemed net.ipv4.tcp_shrink_window=1 worth a try.
(Also, the rationale behind net.ipv4.tcp_shrink_window=1 laid out in
[2] and [3] sounded reasonable.)

But yes, it's feasible for us to revert to the default
net.ipv4.tcp_shrink_window=0, in particular because there's another
workaround for the jumbo frame issue: reduce the MTU. We've set
net.ipv4.tcp_shrink_window=0 yesterday and haven't seen the issue
since. So:

6.6.74 + net.ipv4.tcp_shrink_window=1: issue occurs
6.6.83 + net.ipv4.tcp_shrink_window=1: issue occurs
6.6.74 + net.ipv4.tcp_shrink_window=0: no issue so far
6.6.83 + net.ipv4.tcp_shrink_window=0: no issue so far

Since the issue occurred sporadically, it's too soon to be fully
confident that it's gone with net.ipv4.tcp_shrink_window=0. We'll
write again in a week or so to confirm.

If net.ipv4.tcp_shrink_window=1 seems to have caused this issue, we'd
still be curious to understand why it leads to TCP connections being
stuck indefinitely even though the recv-q (as reported by ss) is 0.
Assuming the recv-q was indeed correctly reported as 0, the issue
might be that receive buffers can fill up in a way so that the only
way for data to leave the receive buffer is receipt of further data.
In particular, the application can't read data out of the receive
buffer and empty it that way. Maybe filling up buffers with data
received out-of-order (whether we SACK it or not) satisfies this
condition. This would explain why we saw this issue only in the
presence of SACK flags before we disabled SACK. With
net.ipv4.tcp_shrink_window=1, a full receive buffer leads to a zero
window being advertised (see [2]) and if the buffer filled up in a way
so that no data can leave until further data is received, we are stuck
forever because the kernel drops incoming data due to the zero window.
In contrast, with ipv4.tcp_shrink_window=0, we will keep advertising a
non-zero window, so incoming data isn't dropped and we can have data
leave the receive buffer. I'm speculating here; once we confirm that
the issue seems to have been triggered by
net.ipv4.tcp_shrink_window=1, I'd be keen to hear other thoughts as to
why the setting may have this effect in certain environments.

[1] https://marc.info/?l=linux-netdev&m=174600337131981&w=2
[2] https://github.com/torvalds/linux/commit/b650d953cd391595e536153ce30b4aab385643ac
[3] https://blog.cloudflare.com/unbounded-memory-usage-by-tcp-for-receive-buffers-and-how-we-fixed-it/