[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL+tcoApiWPx8JW9DeQ6VbAH7Dnqtw7PmVVvup9HMyBHHDhvcQ@mail.gmail.com>
Date: Mon, 12 Aug 2024 22:00:34 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Xueming Feng <kuro@...oa.me>
Cc: "David S . Miller" <davem@...emloft.net>, netdev@...r.kernel.org,
Eric Dumazet <edumazet@...gle.com>, Lorenzo Colitti <lorenzo@...gle.com>,
Neal Cardwell <ncardwell@...gle.com>, Yuchung Cheng <ycheng@...gle.com>,
Soheil Hassas Yeganeh <soheil@...gle.com>, David Ahern <dsahern@...nel.org>, linux-kernel@...r.kernel.org,
Paolo Abeni <pabeni@...hat.com>, Jakub Kicinski <kuba@...nel.org>
Subject: Re: [PATCH net,v2] tcp: fix forever orphan socket caused by tcp_abort
On Mon, Aug 12, 2024 at 6:53 PM Xueming Feng <kuro@...oa.me> wrote:
>
> We have some problem closing zero-window fin-wait-1 tcp sockets in our
> environment. This patch come from the investigation.
>
> Previously tcp_abort only sends out reset and calls tcp_done when the
> socket is not SOCK_DEAD, aka orphan. For orphan socket, it will only
> purging the write queue, but not close the socket and left it to the
> timer.
>
> While purging the write queue, tp->packets_out and sk->sk_write_queue
> is cleared along the way. However tcp_retransmit_timer have early
> return based on !tp->packets_out and tcp_probe_timer have early
> return based on !sk->sk_write_queue.
>
> This caused ICSK_TIME_RETRANS and ICSK_TIME_PROBE0 not being resched
> and socket not being killed by the timers, converting a zero-windowed
> orphan into a forever orphan.
>
> This patch removes the SOCK_DEAD check in tcp_abort, making it send
> reset to peer and close the socket accordingly. Preventing the
> timer-less orphan from happening.
>
> According to Lorenzo's email in the v1 thread, the check was there to
> prevent force-closing the same socket twice. That situation is handled
> by testing for TCP_CLOSE inside lock, and returning -ENOENT if it is
> already closed.
>
> The -ENOENT code comes from the associate patch Lorenzo made for
> iproute2-ss; link attached below.
>
> Link: https://patchwork.ozlabs.org/project/netdev/patch/1450773094-7978-3-git-send-email-lorenzo@google.com/
> Fixes: c1e64e298b8c ("net: diag: Support destroying TCP sockets.")
> Signed-off-by: Xueming Feng <kuro@...oa.me>
You seem to have forgotten to CC Jakub and Paolo which are also
networking maintainers.
> ---
> net/ipv4/tcp.c | 18 +++++++++++-------
> 1 file changed, 11 insertions(+), 7 deletions(-)
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index e03a342c9162..831a18dc7aa6 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -4637,6 +4637,13 @@ int tcp_abort(struct sock *sk, int err)
> /* Don't race with userspace socket closes such as tcp_close. */
> lock_sock(sk);
>
> + /* Avoid closing the same socket twice. */
> + if (sk->sk_state == TCP_CLOSE) {
> + if (!has_current_bpf_ctx())
> + release_sock(sk);
> + return -ENOENT;
> + }
> +
> if (sk->sk_state == TCP_LISTEN) {
> tcp_set_state(sk, TCP_CLOSE);
> inet_csk_listen_stop(sk);
> @@ -4646,16 +4653,13 @@ int tcp_abort(struct sock *sk, int err)
> local_bh_disable();
> bh_lock_sock(sk);
>
> - if (!sock_flag(sk, SOCK_DEAD)) {
> - if (tcp_need_reset(sk->sk_state))
> - tcp_send_active_reset(sk, GFP_ATOMIC,
> - SK_RST_REASON_NOT_SPECIFIED);
> - tcp_done_with_error(sk, err);
> - }
> + if (tcp_need_reset(sk->sk_state))
> + tcp_send_active_reset(sk, GFP_ATOMIC,
> + SK_RST_REASON_NOT_SPECIFIED);
Please use SK_RST_REASON_TCP_STATE here. I should have pointed out this earlier.
Please feel free to add:
Reviewed-by: Jason Xing <kerneljasonxing@...il.com>
in your next submission.
Thanks,
Jason
> + tcp_done_with_error(sk, err);
>
> bh_unlock_sock(sk);
> local_bh_enable();
> - tcp_write_queue_purge(sk);
> if (!has_current_bpf_ctx())
> release_sock(sk);
> return 0;
> --
Powered by blists - more mailing lists