[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20260120214802.270100-1-kuniyu@google.com>
Date: Tue, 20 Jan 2026 21:47:16 +0000
From: Kuniyuki Iwashima <kuniyu@...gle.com>
To: gopalmalaviya53@...il.com
Cc: netdev@...r.kernel.org
Subject: Re: [RFC] net: ipv4: optional early cleanup of half-closed TCP sockets
From: Gopal Malaviya <gopalmalaviya53@...il.com>
Date: Wed, 21 Jan 2026 02:03:39 +0530
> Hi,
>
> Background:
>
> I am looking into cases where TCP sockets that transition into
> half-closed states (CLOSE_WAIT or FIN_WAIT2 after receipt of FIN)
> remain available long enough to be reused by userland connection
> pools. In some HTTP client workloads, especially those involving
> frequent requests with large request bodies, reuse of such sockets
> can lead to follow-up failures such as timeouts or premature close
> events on subsequent operations.
>
> This behavior is compliant with TCP semantics, but application-level
> connection pools may incorrectly assume that a socket is still usable
> as long as it has not been explicitly closed.
>
> Problem:
>
> When a remote peer closes its send side early, the local socket
> enters a half-closed state as described in RFC 793, RFC 1122, and
> RFC 9293. These states are correct and expected. However, sockets
> in CLOSE_WAIT or FIN_WAIT2 may persist long enough to be returned
> to userland pools, even though practical data exchange is no longer
> possible.
>
> For workloads that rely heavily on persistent connection reuse,
> this can cause intermittent and difficult-to-diagnose failures.
>
> Proposal:
>
> Introduce an optional sysctl:
>
> net.ipv4.tcp_aggressive_halfclose = 0 (default)
>
> When enabled:
>
> - Upon receiving FIN and transitioning into CLOSE_WAIT or FIN_WAIT2,
> the socket is marked as a candidate for early teardown.
>
> - After a short configurable grace period (seconds or keepalive
> probes), if the socket remains half-closed, the kernel performs
> a normal teardown using existing mechanisms (e.g. tcp_done()).
>
> - Sockets handled in this mode would also avoid TIME_WAIT reuse,
> ensuring they are not inadvertently returned to userland.
>
> A secondary sysctl could control the grace interval, for example:
>
> net.ipv4.tcp_aggressive_halfclose_grace = <seconds>
>
> Default TCP behavior remains unchanged unless explicitly enabled.
>
> Rationale:
>
> The intent is to provide an opt-in mechanism for environments where
> reuse of half-closed sockets interacts poorly with application-managed
> connection pools. The proposal does not modify semantics for established
> connections, connection setup, or orderly close initiated locally.
>
> RFC 793, RFC 1122, and RFC 9293 define the TCP state machine and
> half-close behavior but allow implementations flexibility in resource
> management and socket lifetime. This proposal aims to use that
> flexibility in a narrowly-scoped and optional manner.
>
> Implementation notes (initial thoughts):
>
> - Tag sockets on FIN reception when entering CLOSE_WAIT or FIN_WAIT2.
> - Apply a short timer or probe-based grace period.
> - On expiry, perform standard teardown.
> - Avoid TIME_WAIT reuse for sockets marked for aggressive half-close.
> - Keep all behavior gated behind sysctl(s).
>
> Request for feedback:
>
> Before preparing a full patch series, I would appreciate feedback on:
>
> - Whether the general idea is acceptable as an opt-in extension.
> - Preferred naming and placement of the sysctl(s).
> - Whether a grace period is preferred over immediate teardown.
> - Any interactions with existing timers or state transitions
> that should be considered.
> - Any related prior discussions worth reviewing.
You can implement the logic in userspace,
e.g. with "ss --kill" :
1. Create CLOSE-WAIT and FIN-WAIT-2 sockets
# python3
>>> from socket import *
>>> s = socket()
>>> s.listen()
>>> c = socket()
>>> c.connect(s.getsockname())
>>> s1, _ = s.accept()
>>> c
<socket.socket fd=6, family=2, type=1, proto=0, laddr=('127.0.0.1', 46490), raddr=('127.0.0.1', 58241)>
>>> c.close()
# ss -tan
...
CLOSE-WAIT 1 0 127.0.0.1:58241 127.0.0.1:46490
FIN-WAIT-2 0 0 127.0.0.1:46490 127.0.0.1:58241
2. Close them
# ss --kill -t sport == 46490
...
FIN-WAIT-2 0 0 127.0.0.1:46490 127.0.0.1:58241
# ss --kill -t dport == 46490
...
CLOSE-WAIT 1 0 127.0.0.1:58241 127.0.0.1:46490
Powered by blists - more mailing lists