[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL+tcoBJxe6GkosVCS5Vzwk_z8W1WmxqLFELzXNwCRSYkQUyHw@mail.gmail.com>
Date: Thu, 4 Sep 2025 14:19:08 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Kuniyuki Iwashima <kuniyu@...gle.com>
Cc: Eric Dumazet <edumazet@...gle.com>, "David S . Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Neal Cardwell <ncardwell@...gle.com>, Simon Horman <horms@...nel.org>, netdev@...r.kernel.org,
eric.dumazet@...il.com
Subject: Re: [PATCH net-next 1/3] tcp: fix __tcp_close() to only send RST when required
On Thu, Sep 4, 2025 at 1:32 PM Kuniyuki Iwashima <kuniyu@...gle.com> wrote:
>
> On Wed, Sep 3, 2025 at 10:04 PM Jason Xing <kerneljasonxing@...il.com> wrote:
> >
> > On Wed, Sep 3, 2025 at 4:47 PM Eric Dumazet <edumazet@...gle.com> wrote:
> > >
> > > If the receive queue contains payload that was already
> > > received, __tcp_close() can send an unexpected RST.
> > >
> > > Refine the code to take tp->copied_seq into account,
> > > as we already do in tcp recvmsg().
> > >
> > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> >
> > Sorry, Eric. I might be wrong, and I don't think it's a bugfix for now.
> >
> > IIUC, it's not possible that one skb stays in the receive queue and
> > all of the data has been consumed in tcp_recvmsg() unless it's
> > MSG_PEEK mode. So my understanding is that the patch tries to cover
> > the case where partial data of skb is read by applications and the
> > whole skb has not been unlinked from the receive queue yet. Sure, as
> > we can learn from tcp_sendsmg(), skb can be partially read.
>
> You can find a clear example in patch 2 that this patch fixes.
Oh, great, a very interesting corner case: resending data with FIN....
I just wasn't able to read the second patch in time...
Reviewed-by: Jason Xing <kerneljasonxing@...il.com>
Thanks,
Jason
>
> Without patch 1, the test fails:
>
> # ./ksft_runner.sh tcp_close_no_rst.pkt
> ...
> tcp_close_no_rst.pkt:32: error handling packet: live packet field
> tcp_fin: expected: 1 (0x1) vs actual: 0 (0x0)
> script packet: 0.140854 F. 1:1(0) ack 1002
> actual packet: 0.140844 R. 1:1(0) ack 1002 win 65535
> not ok 1 ipv4
>
>
> >
> > As long as 'TCP_SKB_CB(skb)->end_seq - TCP_SKB_CB(skb)->seq' has data
> > len, and the skb still exists in the receive queue, it can directly
> > means some part of skb hasn't been read yet. We can call it the unread
> > data case then, so the logic before this patch is right.
> >
> > Two conditions (1. skb still stays in the queue, 2. skb has data) make
> > sure that the data unread case can be detected and then sends an RST.
> > No need to replace it with copied_seq, I wonder? At least, it's not a
> > bug.
> >
> > Thanks,
> > Jason
> >
> >
> >
> >
> > > Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> > > ---
> > > net/ipv4/tcp.c | 9 +++++----
> > > 1 file changed, 5 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > > index 40b774b4f587..39eb03f6d07f 100644
> > > --- a/net/ipv4/tcp.c
> > > +++ b/net/ipv4/tcp.c
> > > @@ -3099,8 +3099,8 @@ bool tcp_check_oom(const struct sock *sk, int shift)
> > >
> > > void __tcp_close(struct sock *sk, long timeout)
> > > {
> > > + bool data_was_unread = false;
> > > struct sk_buff *skb;
> > > - int data_was_unread = 0;
> > > int state;
> > >
> > > WRITE_ONCE(sk->sk_shutdown, SHUTDOWN_MASK);
> > > @@ -3119,11 +3119,12 @@ void __tcp_close(struct sock *sk, long timeout)
> > > * reader process may not have drained the data yet!
> > > */
> > > while ((skb = __skb_dequeue(&sk->sk_receive_queue)) != NULL) {
> > > - u32 len = TCP_SKB_CB(skb)->end_seq - TCP_SKB_CB(skb)->seq;
> > > + u32 end_seq = TCP_SKB_CB(skb)->end_seq;
> > >
> > > if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
> > > - len--;
> > > - data_was_unread += len;
> > > + end_seq--;
> > > + if (after(end_seq, tcp_sk(sk)->copied_seq))
> > > + data_was_unread = true;
> > > __kfree_skb(skb);
> > > }
> > >
> > > --
> > > 2.51.0.338.gd7d06c2dae-goog
> > >
> > >
Powered by blists - more mailing lists