lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iKp=Mxu+kyB3cSB2sKevMJa6A3octSCJZM=oz4q+DC=bA@mail.gmail.com>
Date: Thu, 1 Aug 2024 15:11:10 +0200
From: Eric Dumazet <edumazet@...gle.com>
To: Xueming Feng <kuro@...oa.me>, Lorenzo Colitti <lorenzo@...gle.com>
Cc: "David S . Miller" <davem@...emloft.net>, netdev@...r.kernel.org, 
	Neal Cardwell <ncardwell@...gle.com>, Yuchung Cheng <ycheng@...gle.com>, 
	Soheil Hassas Yeganeh <soheil@...gle.com>, David Ahern <dsahern@...nel.org>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH net] tcp: fix forever orphan socket caused by tcp_abort

On Thu, Aug 1, 2024 at 1:17 PM Xueming Feng <kuro@...oa.me> wrote:
>
> We have some problem closing zero-window fin-wait-1 tcp sockets in our
> environment. This patch come from the investigation.
>
> Previously tcp_abort only sends out reset and calls tcp_done when the
> socket is not SOCK_DEAD aka. orphan. For orphan socket, it will only
> purging the write queue, but not close the socket and left it to the
> timer.
>
> While purging the write queue, tp->packets_out and sk->sk_write_queue
> is cleared along the way. However tcp_retransmit_timer have early
> return based on !tp->packets_out and tcp_probe_timer have early
> return based on !sk->sk_write_queue.
>
> This caused ICSK_TIME_RETRANS and ICSK_TIME_PROBE0 not being resched
> and socket not being killed by the timers. Converting a zero-windowed
> orphan to a forever orphan.
>
> This patch removes the SOCK_DEAD check in tcp_abort, making it send
> reset to peer and close the socket accordingly. Preventing the
> timer-less orphan from happening.
>
> Fixes: e05836ac07c7 ("tcp: purge write queue upon aborting the connection")
> Fixes: bffd168c3fc5 ("tcp: clear tp->packets_out when purging write queue")
> Signed-off-by: Xueming Feng <kuro@...oa.me>

This seems legit, but are you sure these two blamed commits added this bug ?

Even before them, we should have called tcp_done() right away, instead
of waiting for a (possibly long) timer to complete the job.

This might be important when killing millions of sockets on a busy server.

CC Lorenzo

Lorenzo, do you recall why your patch was testing the SOCK_DEAD flag ?

Thanks.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ