[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20231012084601.63183-1-kuro@kuroa.me>
Date: Thu, 12 Oct 2023 16:46:01 +0800
From: Xueming Feng <kuro@...oa.me>
To: cdleonard@...il.com
Cc: davem@...emloft.net,
dsahern@...nel.org,
edumazet@...gle.com,
kuba@...nel.org,
linux-kernel@...r.kernel.org,
netdev@...r.kernel.org,
pabeni@...hat.com,
usama.anjum@...labora.com,
yoshfuji@...ux-ipv6.org
Subject: Re: [PATCH RFC] tcp: diag: Also support for FIN_WAIT1 sockets for tcp_abort()
> Aborting tcp connections via ss -K doesn't work in TCP_FIN_WAIT1 state,
> this happens because the SOCK_DEAD flag is set. Fix by ignoring that > flag
> for this special case.
>
> Signed-off-by: Leonard Crestez <cdleonard@...il.com>
>
> ---
> net/ipv4/tcp.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> I tested that this fixes the problem but not certain about correctness.
>
> Support for TCP_TIME_WAIT was added recently but it doesn't fix
> TCP_FIN_WAIT1.
>
> See: https://lore.kernel.org/netdev/20220627121038.> 226500-1-edumazet@...gle.com/
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index d9dd998fdb76..215e7d3fed13 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -4661,11 +4661,11 @@ int tcp_abort(struct sock *sk, int err)
>
> /* Don't race with BH socket closes such as inet_csk_listen_stop. */
> local_bh_disable();
> bh_lock_sock(sk);
>
> - if (!sock_flag(sk, SOCK_DEAD)) {
> + if (sk->sk_state == TCP_FIN_WAIT1 || !sock_flag(sk, SOCK_DEAD)) {
> sk->sk_err = err;
> /* This barrier is coupled with smp_rmb() in tcp_poll() */
> smp_wmb();
> sk_error_report(sk);
> if (tcp_need_reset(sk->sk_state))
> --
I recently encountered a problem that is related to this patch. Some of our
machines have orphaned TCP connections in FIN_WAIT1 state that stuck in
zero window probing state, because the probes are being acked.
So I decide to kill it with `ss -K` that calls `tcp_abort`, it failed to kill
the socket while reporting success. However, the socket stopped probing and
stays in FIN_WAIT1 state *forever*, with ss reporting no timer associated with
the socket.
After some amateurish debugging, I found that because the FIN_WAIT1 socket have
SOCK_DEAD flag set. Thus, `tcp_abort` will not call `tcp_done` but clear both
`sk_write_queue` and `tcp_rtx_queue` in `tcp_write_queue_purge(* sock)`,
this has caused some problem when the socket is in 'persist' or 'retransmit'.
`tcp_probe_timer()` will check if `sk_write_queue` is not empty and then reset
the timer. Same goes for `tcp_retransmit_timer()`, which will check if
`tcp_rtx_queue` is not empty and then reset the timer. Clearing those queues
without actually closing the socket caused the timer not being reset and the
socket stuck in FIN_WAIT1 state forever.
I can confirm that this patch will indeed close the socket, but I am also not
sure about the logical correctness of this patch being a newbie.
Powered by blists - more mailing lists