[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iKp=Mxu+kyB3cSB2sKevMJa6A3octSCJZM=oz4q+DC=bA@mail.gmail.com>
Date: Thu, 1 Aug 2024 15:11:10 +0200
From: Eric Dumazet <edumazet@...gle.com>
To: Xueming Feng <kuro@...oa.me>, Lorenzo Colitti <lorenzo@...gle.com>
Cc: "David S . Miller" <davem@...emloft.net>, netdev@...r.kernel.org, 
	Neal Cardwell <ncardwell@...gle.com>, Yuchung Cheng <ycheng@...gle.com>, 
	Soheil Hassas Yeganeh <soheil@...gle.com>, David Ahern <dsahern@...nel.org>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH net] tcp: fix forever orphan socket caused by tcp_abort
On Thu, Aug 1, 2024 at 1:17 PM Xueming Feng <kuro@...oa.me> wrote:
>
> We have some problem closing zero-window fin-wait-1 tcp sockets in our
> environment. This patch come from the investigation.
>
> Previously tcp_abort only sends out reset and calls tcp_done when the
> socket is not SOCK_DEAD aka. orphan. For orphan socket, it will only
> purging the write queue, but not close the socket and left it to the
> timer.
>
> While purging the write queue, tp->packets_out and sk->sk_write_queue
> is cleared along the way. However tcp_retransmit_timer have early
> return based on !tp->packets_out and tcp_probe_timer have early
> return based on !sk->sk_write_queue.
>
> This caused ICSK_TIME_RETRANS and ICSK_TIME_PROBE0 not being resched
> and socket not being killed by the timers. Converting a zero-windowed
> orphan to a forever orphan.
>
> This patch removes the SOCK_DEAD check in tcp_abort, making it send
> reset to peer and close the socket accordingly. Preventing the
> timer-less orphan from happening.
>
> Fixes: e05836ac07c7 ("tcp: purge write queue upon aborting the connection")
> Fixes: bffd168c3fc5 ("tcp: clear tp->packets_out when purging write queue")
> Signed-off-by: Xueming Feng <kuro@...oa.me>
This seems legit, but are you sure these two blamed commits added this bug ?
Even before them, we should have called tcp_done() right away, instead
of waiting for a (possibly long) timer to complete the job.
This might be important when killing millions of sockets on a busy server.
CC Lorenzo
Lorenzo, do you recall why your patch was testing the SOCK_DEAD flag ?
Thanks.
Powered by blists - more mailing lists
 
