[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAVpQUBgCyC+y+2M7=WKJVk=sivgeZtE2kwCxDLFCrgezycjZg@mail.gmail.com>
Date: Wed, 3 Sep 2025 22:31:58 -0700
From: Kuniyuki Iwashima <kuniyu@...gle.com>
To: Jason Xing <kerneljasonxing@...il.com>
Cc: Eric Dumazet <edumazet@...gle.com>, "David S . Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Neal Cardwell <ncardwell@...gle.com>, Simon Horman <horms@...nel.org>, netdev@...r.kernel.org,
eric.dumazet@...il.com
Subject: Re: [PATCH net-next 1/3] tcp: fix __tcp_close() to only send RST when required
On Wed, Sep 3, 2025 at 10:04 PM Jason Xing <kerneljasonxing@...il.com> wrote:
>
> On Wed, Sep 3, 2025 at 4:47 PM Eric Dumazet <edumazet@...gle.com> wrote:
> >
> > If the receive queue contains payload that was already
> > received, __tcp_close() can send an unexpected RST.
> >
> > Refine the code to take tp->copied_seq into account,
> > as we already do in tcp recvmsg().
> >
> > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
>
> Sorry, Eric. I might be wrong, and I don't think it's a bugfix for now.
>
> IIUC, it's not possible that one skb stays in the receive queue and
> all of the data has been consumed in tcp_recvmsg() unless it's
> MSG_PEEK mode. So my understanding is that the patch tries to cover
> the case where partial data of skb is read by applications and the
> whole skb has not been unlinked from the receive queue yet. Sure, as
> we can learn from tcp_sendsmg(), skb can be partially read.
You can find a clear example in patch 2 that this patch fixes.
Without patch 1, the test fails:
# ./ksft_runner.sh tcp_close_no_rst.pkt
...
tcp_close_no_rst.pkt:32: error handling packet: live packet field
tcp_fin: expected: 1 (0x1) vs actual: 0 (0x0)
script packet: 0.140854 F. 1:1(0) ack 1002
actual packet: 0.140844 R. 1:1(0) ack 1002 win 65535
not ok 1 ipv4
>
> As long as 'TCP_SKB_CB(skb)->end_seq - TCP_SKB_CB(skb)->seq' has data
> len, and the skb still exists in the receive queue, it can directly
> means some part of skb hasn't been read yet. We can call it the unread
> data case then, so the logic before this patch is right.
>
> Two conditions (1. skb still stays in the queue, 2. skb has data) make
> sure that the data unread case can be detected and then sends an RST.
> No need to replace it with copied_seq, I wonder? At least, it's not a
> bug.
>
> Thanks,
> Jason
>
>
>
>
> > Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> > ---
> > net/ipv4/tcp.c | 9 +++++----
> > 1 file changed, 5 insertions(+), 4 deletions(-)
> >
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index 40b774b4f587..39eb03f6d07f 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -3099,8 +3099,8 @@ bool tcp_check_oom(const struct sock *sk, int shift)
> >
> > void __tcp_close(struct sock *sk, long timeout)
> > {
> > + bool data_was_unread = false;
> > struct sk_buff *skb;
> > - int data_was_unread = 0;
> > int state;
> >
> > WRITE_ONCE(sk->sk_shutdown, SHUTDOWN_MASK);
> > @@ -3119,11 +3119,12 @@ void __tcp_close(struct sock *sk, long timeout)
> > * reader process may not have drained the data yet!
> > */
> > while ((skb = __skb_dequeue(&sk->sk_receive_queue)) != NULL) {
> > - u32 len = TCP_SKB_CB(skb)->end_seq - TCP_SKB_CB(skb)->seq;
> > + u32 end_seq = TCP_SKB_CB(skb)->end_seq;
> >
> > if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
> > - len--;
> > - data_was_unread += len;
> > + end_seq--;
> > + if (after(end_seq, tcp_sk(sk)->copied_seq))
> > + data_was_unread = true;
> > __kfree_skb(skb);
> > }
> >
> > --
> > 2.51.0.338.gd7d06c2dae-goog
> >
> >
Powered by blists - more mailing lists