lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 5 Jun 2023 15:42:29 -0700
From: Stephen Hemminger <stephen@...workplumber.org>
To: Mike Freemon <mfreemon@...udflare.com>
Cc: netdev@...r.kernel.org, kernel-team@...udflare.com
Subject: Re: [PATCH] Add a sysctl to allow TCP window shrinking in order to
 honor memory limits

On Mon,  5 Jun 2023 15:38:57 -0500
Mike Freemon <mfreemon@...udflare.com> wrote:

> From: "mfreemon@...udflare.com" <mfreemon@...udflare.com>
> 
> Under certain circumstances, the tcp receive buffer memory limit
> set by autotuning is ignored, and the receive buffer can grow
> unrestrained until it reaches tcp_rmem[2].
> 
> To reproduce:  Connect a TCP session with the receiver doing
> nothing and the sender sending small packets (an infinite loop
> of socket send() with 4 bytes of payload with a sleep of 1 ms
> in between each send()).  This will fill the tcp receive buffer
> all the way to tcp_rmem[2], ignoring the autotuning limit
> (sk_rcvbuf).
> 
> As a result, a host can have individual tcp sessions with receive
> buffers of size tcp_rmem[2], and the host itself can reach tcp_mem
> limits, causing the host to go into tcp memory pressure mode.
> 
> The fundamental issue is the relationship between the granularity
> of the window scaling factor and the number of byte ACKed back
> to the sender.  This problem has previously been identified in
> RFC 7323, appendix F [1].
> 
> The Linux kernel currently adheres to never shrinking the window.
> 
> In addition to the overallocation of memory mentioned above, this
> is also functionally incorrect, because once tcp_rmem[2] is
> reached, the receiver will drop in-window packets resulting in
> retransmissions and an eventual timeout of the tcp session.  A
> receive buffer full condition should instead result in a zero
> window and an indefinite wait.
> 
> In practice, this problem is largely hidden for most flows.  It
> is not applicable to mice flows.  Elephant flows can send data
> fast enough to "overrun" the sk_rcvbuf limit (in a single ACK),
> triggering a zero window.
> 
> But this problem does show up for other types of flows.  A good
> example are websockets and other type of flows that send small
> amounts of data spaced apart slightly in time.  In these cases,
> we directly encounter the problem described in [1].
> 
> RFC 7323, section 2.4 [2], says there are instances when a retracted
> window can be offered, and that TCP implementations MUST ensure
> that they handle a shrinking window, as specified in RFC 1122,
> section 4.2.2.16 [3].  All prior RFCs on the topic of tcp window
> management have made clear that sender must accept a shrunk window
> from the receiver, including RFC 793 [4] and RFC 1323 [5].
> 
> This patch implements the functionality to shrink the tcp window
> when necessary to keep the right edge within the memory limit by
> autotuning (sk_rcvbuf).  This new functionality is enabled with
> the following sysctl:
> 
> sysctl: net.ipv4.tcp_shrink_window
> 
> This sysctl changes how the TCP window is calculated.
> 
> If sysctl tcp_shrink_window is zero (the default value), then the
> window is never shrunk.
> 
> If sysctl tcp_shrink_window is non-zero, then the memory limit
> set by autotuning is honored.  This requires that the TCP window
> be shrunk ("retracted") as described in RFC 1122.
> 
> [1] https://www.rfc-editor.org/rfc/rfc7323#appendix-F
> [2] https://www.rfc-editor.org/rfc/rfc7323#section-2.4
> [3] https://www.rfc-editor.org/rfc/rfc1122#page-91
> [4] https://www.rfc-editor.org/rfc/rfc793
> [5] https://www.rfc-editor.org/rfc/rfc1323
> 
> Signed-off-by: Mike Freemon <mfreemon@...udflare.com>

Does Linux TCP really need another tuning parameter?
Will tests get run with both feature on and off?
What default will distributions ship with?

Sounds like unbounded receive window growth is always a bad
idea and a latent bug.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ