lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20260120214802.270100-1-kuniyu@google.com>
Date: Tue, 20 Jan 2026 21:47:16 +0000
From: Kuniyuki Iwashima <kuniyu@...gle.com>
To: gopalmalaviya53@...il.com
Cc: netdev@...r.kernel.org
Subject: Re: [RFC] net: ipv4: optional early cleanup of half-closed TCP sockets

From: Gopal Malaviya <gopalmalaviya53@...il.com>
Date: Wed, 21 Jan 2026 02:03:39 +0530
> Hi,
> 
> Background:
> 
> I am looking into cases where TCP sockets that transition into
> half-closed states (CLOSE_WAIT or FIN_WAIT2 after receipt of FIN)
> remain available long enough to be reused by userland connection
> pools. In some HTTP client workloads, especially those involving
> frequent requests with large request bodies, reuse of such sockets
> can lead to follow-up failures such as timeouts or premature close
> events on subsequent operations.
> 
> This behavior is compliant with TCP semantics, but application-level
> connection pools may incorrectly assume that a socket is still usable
> as long as it has not been explicitly closed.
> 
> Problem:
> 
> When a remote peer closes its send side early, the local socket
> enters a half-closed state as described in RFC 793, RFC 1122, and
> RFC 9293. These states are correct and expected. However, sockets
> in CLOSE_WAIT or FIN_WAIT2 may persist long enough to be returned
> to userland pools, even though practical data exchange is no longer
> possible.
> 
> For workloads that rely heavily on persistent connection reuse,
> this can cause intermittent and difficult-to-diagnose failures.
> 
> Proposal:
> 
> Introduce an optional sysctl:
> 
>     net.ipv4.tcp_aggressive_halfclose = 0 (default)
> 
> When enabled:
> 
>   - Upon receiving FIN and transitioning into CLOSE_WAIT or FIN_WAIT2,
>     the socket is marked as a candidate for early teardown.
> 
>   - After a short configurable grace period (seconds or keepalive
>     probes), if the socket remains half-closed, the kernel performs
>     a normal teardown using existing mechanisms (e.g. tcp_done()).
> 
>   - Sockets handled in this mode would also avoid TIME_WAIT reuse,
>     ensuring they are not inadvertently returned to userland.
> 
> A secondary sysctl could control the grace interval, for example:
> 
>     net.ipv4.tcp_aggressive_halfclose_grace = <seconds>
> 
> Default TCP behavior remains unchanged unless explicitly enabled.
> 
> Rationale:
> 
> The intent is to provide an opt-in mechanism for environments where
> reuse of half-closed sockets interacts poorly with application-managed
> connection pools. The proposal does not modify semantics for established
> connections, connection setup, or orderly close initiated locally.
> 
> RFC 793, RFC 1122, and RFC 9293 define the TCP state machine and
> half-close behavior but allow implementations flexibility in resource
> management and socket lifetime. This proposal aims to use that
> flexibility in a narrowly-scoped and optional manner.
> 
> Implementation notes (initial thoughts):
> 
>   - Tag sockets on FIN reception when entering CLOSE_WAIT or FIN_WAIT2.
>   - Apply a short timer or probe-based grace period.
>   - On expiry, perform standard teardown.
>   - Avoid TIME_WAIT reuse for sockets marked for aggressive half-close.
>   - Keep all behavior gated behind sysctl(s).
> 
> Request for feedback:
> 
> Before preparing a full patch series, I would appreciate feedback on:
> 
>   - Whether the general idea is acceptable as an opt-in extension.
>   - Preferred naming and placement of the sysctl(s).
>   - Whether a grace period is preferred over immediate teardown.
>   - Any interactions with existing timers or state transitions
>     that should be considered.
>   - Any related prior discussions worth reviewing.

You can implement the logic in userspace,
e.g. with "ss --kill" :

1. Create CLOSE-WAIT and FIN-WAIT-2 sockets

  # python3
  >>> from socket import *
  >>> s = socket()
  >>> s.listen()
  >>> c = socket()
  >>> c.connect(s.getsockname())
  >>> s1, _ = s.accept()
  >>> c
  <socket.socket fd=6, family=2, type=1, proto=0, laddr=('127.0.0.1', 46490), raddr=('127.0.0.1', 58241)>
  >>> c.close()

  # ss -tan
  ...
  CLOSE-WAIT 1      0  127.0.0.1:58241      127.0.0.1:46490
  FIN-WAIT-2 0      0  127.0.0.1:46490      127.0.0.1:58241

2. Close them

  # ss --kill -t sport == 46490
  ...
  FIN-WAIT-2 0      0  127.0.0.1:46490      127.0.0.1:58241

  # ss --kill -t dport == 46490
  ...
  CLOSE-WAIT 1      0  127.0.0.1:58241      127.0.0.1:46490

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ