netdev - Re: [PATCH] bpf, sockmap: Update tp->rcv_nxt in sk_psock_skb

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89i+rHTU2eVtkc0H=v+8PczfonOxTqc=fCw+6QRwj_3MURg@mail.gmail.com>
Date: Thu, 9 Oct 2025 00:07:44 -0700
From: Eric Dumazet <edumazet@...gle.com>
To: zhengguoyong <zhenggy@...natelecom.cn>
Cc: john.fastabend@...il.com, jakub@...udflare.com, davem@...emloft.net, 
	kuba@...nel.org, pabeni@...hat.com, netdev@...r.kernel.org, 
	bpf@...r.kernel.org
Subject: Re: [PATCH] bpf, sockmap: Update tp->rcv_nxt in sk_psock_skb_ingress

On Wed, Oct 8, 2025 at 8:07 PM zhengguoyong <zhenggy@...natelecom.cn> wrote:
>
> When using sockmap to forward TCP traffic to the application
> layer of the peer socket, the peer socket's tcp_bpf_recvmsg_parser
> processing flow will synchronously update the tp->copied_seq field.
> This causes tp->rcv_nxt to become less than tp->copied_seq.
>
> Later, when this socket receives SKB packets from the protocol stack,
> in the call chain tcp_data_ready → tcp_epollin_ready, the function
> tcp_epollin_ready will return false, preventing the socket from being
> woken up to receive new packets.
>
> Therefore, it is necessary to synchronously update the tp->rcv_nxt
> information in sk_psock_skb_ingress.
>
> Signed-off-by: GuoYong Zheng <zhenggy@...natelecom.cn>

Hi GuoYong Zheng

We request a Fixes: tag for patches claiming to fix a bug.

How would stable teams decide to backport a patch or not, and to which versions,
without having to fully understand this code ?


> ---
>  net/core/skmsg.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> index 9becadd..e9d841c 100644
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -576,6 +576,7 @@ static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb,
>         struct sock *sk = psock->sk;
>         struct sk_msg *msg;
>         int err;
> +       u32 seq;
>
>         /* If we are receiving on the same sock skb->sk is already assigned,
>          * skip memory accounting and owner transition seeing it already set
> @@ -595,8 +596,15 @@ static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb,
>          */
>         skb_set_owner_r(skb, sk);
>         err = sk_psock_skb_ingress_enqueue(skb, off, len, psock, sk, msg, true);
> -       if (err < 0)
> +       if (err < 0) {
>                 kfree(msg);
> +       } else {
> +               bh_lock_sock_nested(sk);
> +               seq = READ_ONCE(tcp_sk(sk)->rcv_nxt) + len;
> +               WRITE_ONCE(tcp_sk(sk)->rcv_nxt, seq);

This does not look to be the right place.

Re-locking a socket _after_ the fundamental change took place is
fundamentally racy.

Also do we have a guarantee sk is always a TCP socket at this point ?

If yes, why do we have sk_is_tcp() check in sk_psock_init_strp() ?

> +               bh_unlock_sock(sk);
> +       }
> +
>         return err;
>  }
>
> --
> 1.8.3.1