netdev - Re: [PATCH net-next 1/3] tcp: fix __tcp_close() to only send RST when required

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAL+tcoBJxe6GkosVCS5Vzwk_z8W1WmxqLFELzXNwCRSYkQUyHw@mail.gmail.com>
Date: Thu, 4 Sep 2025 14:19:08 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Kuniyuki Iwashima <kuniyu@...gle.com>
Cc: Eric Dumazet <edumazet@...gle.com>, "David S . Miller" <davem@...emloft.net>, 
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, 
	Neal Cardwell <ncardwell@...gle.com>, Simon Horman <horms@...nel.org>, netdev@...r.kernel.org, 
	eric.dumazet@...il.com
Subject: Re: [PATCH net-next 1/3] tcp: fix __tcp_close() to only send RST when required

On Thu, Sep 4, 2025 at 1:32 PM Kuniyuki Iwashima <kuniyu@...gle.com> wrote:
>
> On Wed, Sep 3, 2025 at 10:04 PM Jason Xing <kerneljasonxing@...il.com> wrote:
> >
> > On Wed, Sep 3, 2025 at 4:47 PM Eric Dumazet <edumazet@...gle.com> wrote:
> > >
> > > If the receive queue contains payload that was already
> > > received, __tcp_close() can send an unexpected RST.
> > >
> > > Refine the code to take tp->copied_seq into account,
> > > as we already do in tcp recvmsg().
> > >
> > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> >
> > Sorry, Eric. I might be wrong, and I don't think it's a bugfix for now.
> >
> > IIUC, it's not possible that one skb stays in the receive queue and
> > all of the data has been consumed in tcp_recvmsg() unless it's
> > MSG_PEEK mode. So my understanding is that the patch tries to cover
> > the case where partial data of skb is read by applications and the
> > whole skb has not been unlinked from the receive queue yet. Sure, as
> > we can learn from tcp_sendsmg(), skb can be partially read.
>
> You can find a clear example in patch 2 that this patch fixes.

Oh, great, a very interesting corner case: resending data with FIN....
I just wasn't able to read the second patch in time...

Reviewed-by: Jason Xing <kerneljasonxing@...il.com>

Thanks,
Jason

>
> Without patch 1, the test fails:
>
> # ./ksft_runner.sh tcp_close_no_rst.pkt
> ...
> tcp_close_no_rst.pkt:32: error handling packet: live packet field
> tcp_fin: expected: 1 (0x1) vs actual: 0 (0x0)
> script packet:  0.140854 F. 1:1(0) ack 1002
> actual packet:  0.140844 R. 1:1(0) ack 1002 win 65535
> not ok 1 ipv4
>
>
> >
> > As long as 'TCP_SKB_CB(skb)->end_seq - TCP_SKB_CB(skb)->seq' has data
> > len, and the skb still exists in the receive queue, it can directly
> > means some part of skb hasn't been read yet. We can call it the unread
> > data case then, so the logic before this patch is right.
> >
> > Two conditions (1. skb still stays in the queue, 2. skb has data) make
> > sure that the data unread case can be detected and then sends an RST.
> > No need to replace it with copied_seq, I wonder? At least, it's not a
> > bug.
> >
> > Thanks,
> > Jason
> >
> >
> >
> >
> > > Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> > > ---
> > >  net/ipv4/tcp.c | 9 +++++----
> > >  1 file changed, 5 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > > index 40b774b4f587..39eb03f6d07f 100644
> > > --- a/net/ipv4/tcp.c
> > > +++ b/net/ipv4/tcp.c
> > > @@ -3099,8 +3099,8 @@ bool tcp_check_oom(const struct sock *sk, int shift)
> > >
> > >  void __tcp_close(struct sock *sk, long timeout)
> > >  {
> > > +       bool data_was_unread = false;
> > >         struct sk_buff *skb;
> > > -       int data_was_unread = 0;
> > >         int state;
> > >
> > >         WRITE_ONCE(sk->sk_shutdown, SHUTDOWN_MASK);
> > > @@ -3119,11 +3119,12 @@ void __tcp_close(struct sock *sk, long timeout)
> > >          *  reader process may not have drained the data yet!
> > >          */
> > >         while ((skb = __skb_dequeue(&sk->sk_receive_queue)) != NULL) {
> > > -               u32 len = TCP_SKB_CB(skb)->end_seq - TCP_SKB_CB(skb)->seq;
> > > +               u32 end_seq = TCP_SKB_CB(skb)->end_seq;
> > >
> > >                 if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
> > > -                       len--;
> > > -               data_was_unread += len;
> > > +                       end_seq--;
> > > +               if (after(end_seq, tcp_sk(sk)->copied_seq))
> > > +                       data_was_unread = true;
> > >                 __kfree_skb(skb);
> > >         }
> > >
> > > --
> > > 2.51.0.338.gd7d06c2dae-goog
> > >
> > >