linux-kernel - Re: [PATCH net] tcp: fix forever orphan socket caused by tcp

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89iLNnXEnaAY8xMQR6zeJPTd6ZxnJWo3vHE4d7oe9uXRMUg@mail.gmail.com>
Date: Mon, 5 Aug 2024 09:22:53 +0200
From: Eric Dumazet <edumazet@...gle.com>
To: Jason Xing <kerneljasonxing@...il.com>
Cc: Xueming Feng <kuro@...oa.me>, Lorenzo Colitti <lorenzo@...gle.com>, 
	"David S . Miller" <davem@...emloft.net>, netdev@...r.kernel.org, 
	Neal Cardwell <ncardwell@...gle.com>, Yuchung Cheng <ycheng@...gle.com>, 
	Soheil Hassas Yeganeh <soheil@...gle.com>, David Ahern <dsahern@...nel.org>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH net] tcp: fix forever orphan socket caused by tcp_abort

On Mon, Aug 5, 2024 at 6:52 AM Jason Xing <kerneljasonxing@...il.com> wrote:
>
> On Sat, Aug 3, 2024 at 11:48 PM Jason Xing <kerneljasonxing@...il.com> wrote:
> >
> > Hello Eric,
> >
> > On Thu, Aug 1, 2024 at 9:17 PM Eric Dumazet <edumazet@...gle.com> wrote:
> > >
> > > On Thu, Aug 1, 2024 at 1:17 PM Xueming Feng <kuro@...oa.me> wrote:
> > > >
> > > > We have some problem closing zero-window fin-wait-1 tcp sockets in our
> > > > environment. This patch come from the investigation.
> > > >
> > > > Previously tcp_abort only sends out reset and calls tcp_done when the
> > > > socket is not SOCK_DEAD aka. orphan. For orphan socket, it will only
> > > > purging the write queue, but not close the socket and left it to the
> > > > timer.
> > > >
> > > > While purging the write queue, tp->packets_out and sk->sk_write_queue
> > > > is cleared along the way. However tcp_retransmit_timer have early
> > > > return based on !tp->packets_out and tcp_probe_timer have early
> > > > return based on !sk->sk_write_queue.
> > > >
> > > > This caused ICSK_TIME_RETRANS and ICSK_TIME_PROBE0 not being resched
> > > > and socket not being killed by the timers. Converting a zero-windowed
> > > > orphan to a forever orphan.
> > > >
> > > > This patch removes the SOCK_DEAD check in tcp_abort, making it send
> > > > reset to peer and close the socket accordingly. Preventing the
> > > > timer-less orphan from happening.
> > > >
> > > > Fixes: e05836ac07c7 ("tcp: purge write queue upon aborting the connection")
> > > > Fixes: bffd168c3fc5 ("tcp: clear tp->packets_out when purging write queue")
> > > > Signed-off-by: Xueming Feng <kuro@...oa.me>
> > >
> > > This seems legit, but are you sure these two blamed commits added this bug ?
> > >
> > > Even before them, we should have called tcp_done() right away, instead
> > > of waiting for a (possibly long) timer to complete the job.
> > >
> > > This might be important when killing millions of sockets on a busy server.
> > >
> > > CC Lorenzo
> > >
> > > Lorenzo, do you recall why your patch was testing the SOCK_DEAD flag ?
> >
> > I guess that one of possible reasons is to avoid double-free,
> > something like this, happening in inet_csk_destroy_sock().
> >
> > Let me assume: if we call tcp_close() first under the memory pressure
> > which means tcp_check_oom() returns true and then it will call
> > inet_csk_destroy_sock() in __tcp_close(), later tcp_abort() will call
> > tcp_done() to free the sk again in the inet_csk_destroy_sock() when
> > not testing the SOCK_DEAD flag in tcp_abort.
> >
>
> How about this one which can prevent double calling
> inet_csk_destroy_sock() when we call destroy and close nearly at the
> same time under that circumstance:
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index e03a342c9162..d5d3b21cc824 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -4646,7 +4646,7 @@ int tcp_abort(struct sock *sk, int err)
>         local_bh_disable();
>         bh_lock_sock(sk);
>
> -       if (!sock_flag(sk, SOCK_DEAD)) {
> +       if (sk->sk_state != TCP_CLOSE) {
>                 if (tcp_need_reset(sk->sk_state))
>                         tcp_send_active_reset(sk, GFP_ATOMIC,
>                                               SK_RST_REASON_NOT_SPECIFIED);
>
> Each time we call inet_csk_destroy_sock(), we must make sure we've
> already set the state to TCP_CLOSE. Based on this, I think we can use
> this as an indicator to avoid calling twice to destroy the socket.

I do not think this will work.

With this patch, a listener socket will not get an error notification.

Ideally we need tests for this seldom used feature.