[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89i+rHTU2eVtkc0H=v+8PczfonOxTqc=fCw+6QRwj_3MURg@mail.gmail.com>
Date: Thu, 9 Oct 2025 00:07:44 -0700
From: Eric Dumazet <edumazet@...gle.com>
To: zhengguoyong <zhenggy@...natelecom.cn>
Cc: john.fastabend@...il.com, jakub@...udflare.com, davem@...emloft.net,
kuba@...nel.org, pabeni@...hat.com, netdev@...r.kernel.org,
bpf@...r.kernel.org
Subject: Re: [PATCH] bpf, sockmap: Update tp->rcv_nxt in sk_psock_skb_ingress
On Wed, Oct 8, 2025 at 8:07 PM zhengguoyong <zhenggy@...natelecom.cn> wrote:
>
> When using sockmap to forward TCP traffic to the application
> layer of the peer socket, the peer socket's tcp_bpf_recvmsg_parser
> processing flow will synchronously update the tp->copied_seq field.
> This causes tp->rcv_nxt to become less than tp->copied_seq.
>
> Later, when this socket receives SKB packets from the protocol stack,
> in the call chain tcp_data_ready → tcp_epollin_ready, the function
> tcp_epollin_ready will return false, preventing the socket from being
> woken up to receive new packets.
>
> Therefore, it is necessary to synchronously update the tp->rcv_nxt
> information in sk_psock_skb_ingress.
>
> Signed-off-by: GuoYong Zheng <zhenggy@...natelecom.cn>
Hi GuoYong Zheng
We request a Fixes: tag for patches claiming to fix a bug.
How would stable teams decide to backport a patch or not, and to which versions,
without having to fully understand this code ?
> ---
> net/core/skmsg.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> index 9becadd..e9d841c 100644
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -576,6 +576,7 @@ static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb,
> struct sock *sk = psock->sk;
> struct sk_msg *msg;
> int err;
> + u32 seq;
>
> /* If we are receiving on the same sock skb->sk is already assigned,
> * skip memory accounting and owner transition seeing it already set
> @@ -595,8 +596,15 @@ static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb,
> */
> skb_set_owner_r(skb, sk);
> err = sk_psock_skb_ingress_enqueue(skb, off, len, psock, sk, msg, true);
> - if (err < 0)
> + if (err < 0) {
> kfree(msg);
> + } else {
> + bh_lock_sock_nested(sk);
> + seq = READ_ONCE(tcp_sk(sk)->rcv_nxt) + len;
> + WRITE_ONCE(tcp_sk(sk)->rcv_nxt, seq);
This does not look to be the right place.
Re-locking a socket _after_ the fundamental change took place is
fundamentally racy.
Also do we have a guarantee sk is always a TCP socket at this point ?
If yes, why do we have sk_is_tcp() check in sk_psock_init_strp() ?
> + bh_unlock_sock(sk);
> + }
> +
> return err;
> }
>
> --
> 1.8.3.1
Powered by blists - more mailing lists