[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CANn89iK6q2r99UOMbrhpCWHZF+FshSs1dpVKttsOnHidVVvTkQ@mail.gmail.com>
Date: Tue, 25 Feb 2025 19:43:19 +0100
From: Eric Dumazet <edumazet@...gle.com>
To: Yuchung Cheng <ycheng@...gle.com>
Cc: Neal Cardwell <ncardwell@...gle.com>, Youngmin Nam <youngmin.nam@...sung.com>,
Jakub Kicinski <kuba@...nel.org>, davem@...emloft.net, dsahern@...nel.org,
pabeni@...hat.com, horms@...nel.org, guo88.liu@...sung.com,
yiwang.cai@...sung.com, netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
joonki.min@...sung.com, hajun.sung@...sung.com, d7271.choe@...sung.com,
sw.ju@...sung.com, "Dujeong.lee" <dujeong.lee@...sung.com>, Kevin Yang <yyd@...gle.com>,
Xueming Feng <kuro@...oa.me>
Subject: Re: [PATCH] tcp: check socket state before calling WARN_ON
On Tue, Feb 25, 2025 at 7:28 PM Yuchung Cheng <ycheng@...gle.com> wrote:
>
> On Tue, Feb 25, 2025 at 9:25 AM Neal Cardwell <ncardwell@...gle.com> wrote:
> >
> > On Mon, Feb 24, 2025 at 4:13 PM Neal Cardwell <ncardwell@...gle.com> wrote:
> > >
> > > On Mon, Feb 3, 2025 at 12:17 AM Youngmin Nam <youngmin.nam@...sung.com> wrote:
> > > >
> > > > > Hi Neal,
> > > > > Thank you for looking into this issue.
> > > > > When we first encountered this issue, we also suspected that tcp_write_queue_purge() was being called.
> > > > > We can provide any information you would like to inspect.
> > >
> > > Thanks again for raising this issue, and providing all that data!
> > >
> > > I've come up with a reproducer for this issue, and an explanation for
> > > why this has only been seen on Android so far, and a theory about a
> > > related socket leak issue, and a proposed fix for the WARN and the
> > > socket leak.
> > >
> > > Here is the scenario:
> > >
> > > + user process A has a socket in TCP_ESTABLISHED
> > >
> > > + user process A calls close(fd)
> > >
> > > + socket calls __tcp_close() and tcp_close_state() decides to enter
> > > TCP_FIN_WAIT1 and send a FIN
> > >
> > > + FIN is lost and retransmitted, making the state:
> > > ---
> > > tp->packets_out = 1
> > > tp->sacked_out = 0
> > > tp->lost_out = 1
> > > tp->retrans_out = 1
> > > ---
> > >
> > > + someone invokes "ss" to --kill the socket using the functionality in
> > > (1e64e298b8 "net: diag: Support destroying TCP sockets")
> > >
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c1e64e298b8cad309091b95d8436a0255c84f54a
> > >
> > > (note: this was added for Android, so would not be surprising to have
> > > this inet_diag --kill run on Android)
> > >
> > > + the ss --kill causes a call to tcp_abort()
> > >
> > > + tcp_abort() calls tcp_write_queue_purge()
> > >
> > > + tcp_write_queue_purge() sets packets_out=0 but leaves lost_out=1,
> > > retrans_out=1
> > >
> > > + tcp_sock still exists in TCP_FIN_WAIT1 but now with an inconsistent state
> > >
> > > + ACK arrives and causes a WARN_ON from tcp_verify_left_out():
> > >
> > > #define tcp_verify_left_out(tp) WARN_ON(tcp_left_out(tp) > tp->packets_out)
> > >
> > > because the state has:
> > >
> > > ---
> > > tcp_left_out(tp) = sacked_out + lost_out = 1
> > > tp->packets_out = 0
> > > ---
> > >
> > > because the state is:
> > >
> > > ---
> > > tp->packets_out = 0
> > > tp->sacked_out = 0
> > > tp->lost_out = 1
> > > tp->retrans_out = 1
> > > ---
> > >
> > > I guess perhaps one fix would be to just have tcp_write_queue_purge()
> > > zero out those other fields:
> > >
> > > ---
> > > tp->sacked_out = 0
> > > tp->lost_out = 0
> > > tp->retrans_out = 0
> > > ---
> > >
> > > However, there is a related and worse problem. Because this killed
> > > socket has tp->packets_out, the next time the RTO timer fires,
> Zeroing all inflights stats in tcp_write_queue_purge still makes sense
> to me. Why will the RTO timer still fire if packets_out is zeroed?
By definition, tcp_write_queue_purge() must only happen when the
socket reaches a final state.
No further transmit is possible, since this broke a major TCP
principle (stream mode, no sendmsg() can be zapped)
tcp_write_timer_handler() immediately returns if the final state is reached.
if (((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_LISTEN)) ..
return;
Also look at INET_CSK_CLEAR_TIMERS, if you want to know why the
retransmit timer can fire.
Powered by blists - more mailing lists