[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMArcTW=mg2gF_e6spPWOCuQdDAWSuKTCdCNPWGqcU1ciq30EQ@mail.gmail.com>
Date: Sat, 17 Aug 2024 22:58:05 +0900
From: Taehee Yoo <ap420073@...il.com>
To: Mina Almasry <almasrymina@...gle.com>
Cc: netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-doc@...r.kernel.org, linux-alpha@...r.kernel.org,
linux-mips@...r.kernel.org, linux-parisc@...r.kernel.org,
sparclinux@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
linux-arch@...r.kernel.org, linux-kselftest@...r.kernel.org,
bpf@...r.kernel.org, linux-media@...r.kernel.org,
dri-devel@...ts.freedesktop.org, "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Donald Hunter <donald.hunter@...il.com>, Jonathan Corbet <corbet@....net>,
Richard Henderson <richard.henderson@...aro.org>, Ivan Kokshaysky <ink@...assic.park.msu.ru>,
Matt Turner <mattst88@...il.com>, Thomas Bogendoerfer <tsbogend@...ha.franken.de>,
"James E.J. Bottomley" <James.Bottomley@...senpartnership.com>, Helge Deller <deller@....de>,
Andreas Larsson <andreas@...sler.com>, Jesper Dangaard Brouer <hawk@...nel.org>,
Ilias Apalodimas <ilias.apalodimas@...aro.org>, Steven Rostedt <rostedt@...dmis.org>,
Masami Hiramatsu <mhiramat@...nel.org>, Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Arnd Bergmann <arnd@...db.de>, Steffen Klassert <steffen.klassert@...unet.com>,
Herbert Xu <herbert@...dor.apana.org.au>, David Ahern <dsahern@...nel.org>,
Willem de Bruijn <willemdebruijn.kernel@...il.com>, Shuah Khan <shuah@...nel.org>,
Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>,
John Fastabend <john.fastabend@...il.com>, Sumit Semwal <sumit.semwal@...aro.org>,
Christian König <christian.koenig@....com>,
Bagas Sanjaya <bagasdotme@...il.com>, Christoph Hellwig <hch@...radead.org>,
Nikolay Aleksandrov <razor@...ckwall.org>, Pavel Begunkov <asml.silence@...il.com>, David Wei <dw@...idwei.uk>,
Jason Gunthorpe <jgg@...pe.ca>, Yunsheng Lin <linyunsheng@...wei.com>,
Shailend Chand <shailend@...gle.com>, Harshitha Ramamurthy <hramamurthy@...gle.com>,
Shakeel Butt <shakeel.butt@...ux.dev>, Jeroen de Borst <jeroendb@...gle.com>,
Praveen Kaligineedi <pkaligineedi@...gle.com>, Willem de Bruijn <willemb@...gle.com>,
Kaiyuan Zhang <kaiyuanz@...gle.com>
Subject: Re: [PATCH net-next v19 09/13] tcp: RX path for devmem TCP
On Wed, Aug 14, 2024 at 6:13 AM Mina Almasry <almasrymina@...gle.com> wrote:
>
Hi Mina,
> In tcp_recvmsg_locked(), detect if the skb being received by the user
> is a devmem skb. In this case - if the user provided the MSG_SOCK_DEVMEM
> flag - pass it to tcp_recvmsg_devmem() for custom handling.
>
> tcp_recvmsg_devmem() copies any data in the skb header to the linear
> buffer, and returns a cmsg to the user indicating the number of bytes
> returned in the linear buffer.
>
> tcp_recvmsg_devmem() then loops over the unaccessible devmem skb frags,
> and returns to the user a cmsg_devmem indicating the location of the
> data in the dmabuf device memory. cmsg_devmem contains this information:
>
> 1. the offset into the dmabuf where the payload starts. 'frag_offset'.
I have been testing this patch and I found a bug.
While testing it with the ncdevmem cmd, it fails to validate buffers
after some period.
This is because tcp_recvmsg_dmabuf() can't handle skb properly when
the parameter offset != 0.
The tcp_recvmsg_dmabuf() already has the code that handles skb if
offset is not 0 but it doesn't work for a specific case.
> 2. the size of the frag. 'frag_size'.
> 3. an opaque token 'frag_token' to return to the kernel when the buffer
> is to be released.
>
> The pages awaiting freeing are stored in the newly added
> sk->sk_user_frags, and each page passed to userspace is get_page()'d.
> This reference is dropped once the userspace indicates that it is
> done reading this page. All pages are released when the socket is
> destroyed.
>
> Signed-off-by: Willem de Bruijn <willemb@...gle.com>
> Signed-off-by: Kaiyuan Zhang <kaiyuanz@...gle.com>
> Signed-off-by: Mina Almasry <almasrymina@...gle.com>
> Reviewed-by: Pavel Begunkov <asml.silence@...il.com>
> Reviewed-by: Eric Dumazet <edumazet@...gle.com>
>
> ---
>
> v16:
> - Fix number assignement (Arnd).
>
> v13:
> - Refactored user frags cleanup into a common function to avoid
> __maybe_unused. (Pavel)
> - change to offset = 0 for some improved clarity.
>
> v11:
> - Refactor to common function te remove conditional lock sparse warning
> (Paolo)
>
> v7:
> - Updated the SO_DEVMEM_* uapi to use the next available entries (Arnd).
> - Updated dmabuf_cmsg struct to be __u64 padded (Arnd).
> - Squashed fix from Eric to initialize sk_user_frags for passive
> sockets (Eric).
>
> v6
> - skb->dmabuf -> skb->readable (Pavel)
> - Fixed asm definitions of SO_DEVMEM_LINEAR/SO_DEVMEM_DMABUF not found
> on some archs.
> - Squashed in locking optimizations from edumazet@...gle.com. With this
> change we lock the xarray once per per tcp_recvmsg_dmabuf() rather
> than once per frag in xa_alloc().
>
> Changes in v1:
> - Added dmabuf_id to dmabuf_cmsg (David/Stan).
> - Devmem -> dmabuf (David).
> - Change tcp_recvmsg_dmabuf() check to skb->dmabuf (Paolo).
> - Use __skb_frag_ref() & napi_pp_put_page() for refcounting (Yunsheng).
>
> RFC v3:
> - Fixed issue with put_cmsg() failing silently.
>
> ---
> arch/alpha/include/uapi/asm/socket.h | 5 +
> arch/mips/include/uapi/asm/socket.h | 5 +
> arch/parisc/include/uapi/asm/socket.h | 5 +
> arch/sparc/include/uapi/asm/socket.h | 5 +
> include/linux/socket.h | 1 +
> include/net/netmem.h | 13 ++
> include/net/sock.h | 2 +
> include/uapi/asm-generic/socket.h | 5 +
> include/uapi/linux/uio.h | 13 ++
> net/ipv4/tcp.c | 255 +++++++++++++++++++++++++-
> net/ipv4/tcp_ipv4.c | 16 ++
> net/ipv4/tcp_minisocks.c | 2 +
> 12 files changed, 322 insertions(+), 5 deletions(-)
>
> diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
> index e94f621903fe..ef4656a41058 100644
> --- a/arch/alpha/include/uapi/asm/socket.h
> +++ b/arch/alpha/include/uapi/asm/socket.h
> @@ -140,6 +140,11 @@
> #define SO_PASSPIDFD 76
> #define SO_PEERPIDFD 77
>
> +#define SO_DEVMEM_LINEAR 78
> +#define SCM_DEVMEM_LINEAR SO_DEVMEM_LINEAR
> +#define SO_DEVMEM_DMABUF 79
> +#define SCM_DEVMEM_DMABUF SO_DEVMEM_DMABUF
> +
> #if !defined(__KERNEL__)
>
> #if __BITS_PER_LONG == 64
> diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
> index 60ebaed28a4c..414807d55e33 100644
> --- a/arch/mips/include/uapi/asm/socket.h
> +++ b/arch/mips/include/uapi/asm/socket.h
> @@ -151,6 +151,11 @@
> #define SO_PASSPIDFD 76
> #define SO_PEERPIDFD 77
>
> +#define SO_DEVMEM_LINEAR 78
> +#define SCM_DEVMEM_LINEAR SO_DEVMEM_LINEAR
> +#define SO_DEVMEM_DMABUF 79
> +#define SCM_DEVMEM_DMABUF SO_DEVMEM_DMABUF
> +
> #if !defined(__KERNEL__)
>
> #if __BITS_PER_LONG == 64
> diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
> index be264c2b1a11..2b817efd4544 100644
> --- a/arch/parisc/include/uapi/asm/socket.h
> +++ b/arch/parisc/include/uapi/asm/socket.h
> @@ -132,6 +132,11 @@
> #define SO_PASSPIDFD 0x404A
> #define SO_PEERPIDFD 0x404B
>
> +#define SO_DEVMEM_LINEAR 78
> +#define SCM_DEVMEM_LINEAR SO_DEVMEM_LINEAR
> +#define SO_DEVMEM_DMABUF 79
> +#define SCM_DEVMEM_DMABUF SO_DEVMEM_DMABUF
> +
> #if !defined(__KERNEL__)
>
> #if __BITS_PER_LONG == 64
> diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
> index 682da3714686..00248fc68977 100644
> --- a/arch/sparc/include/uapi/asm/socket.h
> +++ b/arch/sparc/include/uapi/asm/socket.h
> @@ -133,6 +133,11 @@
> #define SO_PASSPIDFD 0x0055
> #define SO_PEERPIDFD 0x0056
>
> +#define SO_DEVMEM_LINEAR 0x0057
> +#define SCM_DEVMEM_LINEAR SO_DEVMEM_LINEAR
> +#define SO_DEVMEM_DMABUF 0x0058
> +#define SCM_DEVMEM_DMABUF SO_DEVMEM_DMABUF
> +
> #if !defined(__KERNEL__)
>
>
> diff --git a/include/linux/socket.h b/include/linux/socket.h
> index df9cdb8bbfb8..d18cc47e89bd 100644
> --- a/include/linux/socket.h
> +++ b/include/linux/socket.h
> @@ -327,6 +327,7 @@ struct ucred {
> * plain text and require encryption
> */
>
> +#define MSG_SOCK_DEVMEM 0x2000000 /* Receive devmem skbs as cmsg */
> #define MSG_ZEROCOPY 0x4000000 /* Use user data in kernel path */
> #define MSG_SPLICE_PAGES 0x8000000 /* Splice the pages from the iterator in sendmsg() */
> #define MSG_FASTOPEN 0x20000000 /* Send data in TCP SYN */
> diff --git a/include/net/netmem.h b/include/net/netmem.h
> index 284f84a312c2..84043fbdd797 100644
> --- a/include/net/netmem.h
> +++ b/include/net/netmem.h
> @@ -65,6 +65,19 @@ static inline unsigned int net_iov_idx(const struct net_iov *niov)
> return niov - net_iov_owner(niov)->niovs;
> }
>
> +static inline unsigned long net_iov_virtual_addr(const struct net_iov *niov)
> +{
> + struct dmabuf_genpool_chunk_owner *owner = net_iov_owner(niov);
> +
> + return owner->base_virtual +
> + ((unsigned long)net_iov_idx(niov) << PAGE_SHIFT);
> +}
> +
> +static inline u32 net_iov_binding_id(const struct net_iov *niov)
> +{
> + return net_iov_owner(niov)->binding->id;
> +}
> +
> static inline struct net_devmem_dmabuf_binding *
> net_iov_binding(const struct net_iov *niov)
> {
> diff --git a/include/net/sock.h b/include/net/sock.h
> index cce23ac4d514..f8ec869be238 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -337,6 +337,7 @@ struct sk_filter;
> * @sk_txtime_report_errors: set report errors mode for SO_TXTIME
> * @sk_txtime_unused: unused txtime flags
> * @ns_tracker: tracker for netns reference
> + * @sk_user_frags: xarray of pages the user is holding a reference on.
> */
> struct sock {
> /*
> @@ -542,6 +543,7 @@ struct sock {
> #endif
> struct rcu_head sk_rcu;
> netns_tracker ns_tracker;
> + struct xarray sk_user_frags;
> };
>
> struct sock_bh_locked {
> diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
> index 8ce8a39a1e5f..e993edc9c0ee 100644
> --- a/include/uapi/asm-generic/socket.h
> +++ b/include/uapi/asm-generic/socket.h
> @@ -135,6 +135,11 @@
> #define SO_PASSPIDFD 76
> #define SO_PEERPIDFD 77
>
> +#define SO_DEVMEM_LINEAR 78
> +#define SCM_DEVMEM_LINEAR SO_DEVMEM_LINEAR
> +#define SO_DEVMEM_DMABUF 79
> +#define SCM_DEVMEM_DMABUF SO_DEVMEM_DMABUF
> +
> #if !defined(__KERNEL__)
>
> #if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__))
> diff --git a/include/uapi/linux/uio.h b/include/uapi/linux/uio.h
> index 059b1a9147f4..3a22ddae376a 100644
> --- a/include/uapi/linux/uio.h
> +++ b/include/uapi/linux/uio.h
> @@ -20,6 +20,19 @@ struct iovec
> __kernel_size_t iov_len; /* Must be size_t (1003.1g) */
> };
>
> +struct dmabuf_cmsg {
> + __u64 frag_offset; /* offset into the dmabuf where the frag starts.
> + */
> + __u32 frag_size; /* size of the frag. */
> + __u32 frag_token; /* token representing this frag for
> + * DEVMEM_DONTNEED.
> + */
> + __u32 dmabuf_id; /* dmabuf id this frag belongs to. */
> + __u32 flags; /* Currently unused. Reserved for future
> + * uses.
> + */
> +};
> +
> /*
> * UIO_MAXIOV shall be at least 16 1003.1g (5.4.1.1)
> */
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 30e0aa38ba9b..40e7335dae6e 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -471,6 +471,7 @@ void tcp_init_sock(struct sock *sk)
>
> set_bit(SOCK_SUPPORT_ZC, &sk->sk_socket->flags);
> sk_sockets_allocated_inc(sk);
> + xa_init_flags(&sk->sk_user_frags, XA_FLAGS_ALLOC1);
> }
> EXPORT_SYMBOL(tcp_init_sock);
>
> @@ -2323,6 +2324,220 @@ static int tcp_inq_hint(struct sock *sk)
> return inq;
> }
>
> +/* batch __xa_alloc() calls and reduce xa_lock()/xa_unlock() overhead. */
> +struct tcp_xa_pool {
> + u8 max; /* max <= MAX_SKB_FRAGS */
> + u8 idx; /* idx <= max */
> + __u32 tokens[MAX_SKB_FRAGS];
> + netmem_ref netmems[MAX_SKB_FRAGS];
> +};
> +
> +static void tcp_xa_pool_commit_locked(struct sock *sk, struct tcp_xa_pool *p)
> +{
> + int i;
> +
> + /* Commit part that has been copied to user space. */
> + for (i = 0; i < p->idx; i++)
> + __xa_cmpxchg(&sk->sk_user_frags, p->tokens[i], XA_ZERO_ENTRY,
> + (__force void *)p->netmems[i], GFP_KERNEL);
> + /* Rollback what has been pre-allocated and is no longer needed. */
> + for (; i < p->max; i++)
> + __xa_erase(&sk->sk_user_frags, p->tokens[i]);
> +
> + p->max = 0;
> + p->idx = 0;
> +}
> +
> +static void tcp_xa_pool_commit(struct sock *sk, struct tcp_xa_pool *p)
> +{
> + if (!p->max)
> + return;
> +
> + xa_lock_bh(&sk->sk_user_frags);
> +
> + tcp_xa_pool_commit_locked(sk, p);
> +
> + xa_unlock_bh(&sk->sk_user_frags);
> +}
> +
> +static int tcp_xa_pool_refill(struct sock *sk, struct tcp_xa_pool *p,
> + unsigned int max_frags)
> +{
> + int err, k;
> +
> + if (p->idx < p->max)
> + return 0;
> +
> + xa_lock_bh(&sk->sk_user_frags);
> +
> + tcp_xa_pool_commit_locked(sk, p);
> +
> + for (k = 0; k < max_frags; k++) {
> + err = __xa_alloc(&sk->sk_user_frags, &p->tokens[k],
> + XA_ZERO_ENTRY, xa_limit_31b, GFP_KERNEL);
> + if (err)
> + break;
> + }
> +
> + xa_unlock_bh(&sk->sk_user_frags);
> +
> + p->max = k;
> + p->idx = 0;
> + return k ? 0 : err;
> +}
> +
> +/* On error, returns the -errno. On success, returns number of bytes sent to the
> + * user. May not consume all of @remaining_len.
> + */
> +static int tcp_recvmsg_dmabuf(struct sock *sk, const struct sk_buff *skb,
> + unsigned int offset, struct msghdr *msg,
> + int remaining_len)
> +{
> + struct dmabuf_cmsg dmabuf_cmsg = { 0 };
> + struct tcp_xa_pool tcp_xa_pool;
> + unsigned int start;
> + int i, copy, n;
> + int sent = 0;
> + int err = 0;
> +
> + tcp_xa_pool.max = 0;
> + tcp_xa_pool.idx = 0;
> + do {
> + start = skb_headlen(skb);
> +
> + if (skb_frags_readable(skb)) {
> + err = -ENODEV;
> + goto out;
> + }
> +
> + /* Copy header. */
> + copy = start - offset;
> + if (copy > 0) {
> + copy = min(copy, remaining_len);
> +
> + n = copy_to_iter(skb->data + offset, copy,
> + &msg->msg_iter);
> + if (n != copy) {
> + err = -EFAULT;
> + goto out;
> + }
> +
> + offset += copy;
> + remaining_len -= copy;
> +
> + /* First a dmabuf_cmsg for # bytes copied to user
> + * buffer.
> + */
> + memset(&dmabuf_cmsg, 0, sizeof(dmabuf_cmsg));
> + dmabuf_cmsg.frag_size = copy;
> + err = put_cmsg(msg, SOL_SOCKET, SO_DEVMEM_LINEAR,
> + sizeof(dmabuf_cmsg), &dmabuf_cmsg);
> + if (err || msg->msg_flags & MSG_CTRUNC) {
> + msg->msg_flags &= ~MSG_CTRUNC;
> + if (!err)
> + err = -ETOOSMALL;
> + goto out;
> + }
> +
> + sent += copy;
> +
> + if (remaining_len == 0)
> + goto out;
> + }
> +
> + /* after that, send information of dmabuf pages through a
> + * sequence of cmsg
> + */
> + for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
> + skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
> + struct net_iov *niov;
> + u64 frag_offset;
> + int end;
> +
> + /* !skb_frags_readable() should indicate that ALL the
> + * frags in this skb are dmabuf net_iovs. We're checking
> + * for that flag above, but also check individual frags
> + * here. If the tcp stack is not setting
> + * skb_frags_readable() correctly, we still don't want
> + * to crash here.
> + */
> + if (!skb_frag_net_iov(frag)) {
> + net_err_ratelimited("Found non-dmabuf skb with net_iov");
> + err = -ENODEV;
> + goto out;
> + }
> +
> + niov = skb_frag_net_iov(frag);
> + end = start + skb_frag_size(frag);
> + copy = end - offset;
> +
> + if (copy > 0) {
> + copy = min(copy, remaining_len);
> +
> + frag_offset = net_iov_virtual_addr(niov) +
> + skb_frag_off(frag) + offset -
> + start;
> + dmabuf_cmsg.frag_offset = frag_offset;
> + dmabuf_cmsg.frag_size = copy;
> + err = tcp_xa_pool_refill(sk, &tcp_xa_pool,
> + skb_shinfo(skb)->nr_frags - i);
> + if (err)
> + goto out;
> +
> + /* Will perform the exchange later */
> + dmabuf_cmsg.frag_token = tcp_xa_pool.tokens[tcp_xa_pool.idx];
> + dmabuf_cmsg.dmabuf_id = net_iov_binding_id(niov);
> +
> + offset += copy;
> + remaining_len -= copy;
> +
> + err = put_cmsg(msg, SOL_SOCKET,
> + SO_DEVMEM_DMABUF,
> + sizeof(dmabuf_cmsg),
> + &dmabuf_cmsg);
> + if (err || msg->msg_flags & MSG_CTRUNC) {
> + msg->msg_flags &= ~MSG_CTRUNC;
> + if (!err)
> + err = -ETOOSMALL;
> + goto out;
> + }
> +
> + atomic_long_inc(&niov->pp_ref_count);
> + tcp_xa_pool.netmems[tcp_xa_pool.idx++] = skb_frag_netmem(frag);
> +
> + sent += copy;
> +
> + if (remaining_len == 0)
> + goto out;
> + }
> + start = end;
> + }
> +
> + tcp_xa_pool_commit(sk, &tcp_xa_pool);
> + if (!remaining_len)
> + goto out;
> +
> + /* if remaining_len is not satisfied yet, we need to go to the
> + * next frag in the frag_list to satisfy remaining_len.
> + */
> + skb = skb_shinfo(skb)->frag_list ?: skb->next;
> +
> + offset = 0;
If the offset is 5000 and only 4500 bytes are skipped at this point,
the offset should be 500, not 0.
We need to add a condition to set the offset correctly.
> + } while (skb);
> +
> + if (remaining_len) {
> + err = -EFAULT;
> + goto out;
> + }
> +
> +out:
> + tcp_xa_pool_commit(sk, &tcp_xa_pool);
> + if (!sent)
> + sent = err;
> +
> + return sent;
> +}
> +
> /*
> * This routine copies from a sock struct into the user buffer.
> *
> @@ -2336,6 +2551,7 @@ static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len,
> int *cmsg_flags)
> {
> struct tcp_sock *tp = tcp_sk(sk);
> + int last_copied_dmabuf = -1; /* uninitialized */
> int copied = 0;
> u32 peek_seq;
> u32 *seq;
> @@ -2515,15 +2731,44 @@ static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len,
> }
>
> if (!(flags & MSG_TRUNC)) {
> - err = skb_copy_datagram_msg(skb, offset, msg, used);
> - if (err) {
> - /* Exception. Bailout! */
> - if (!copied)
> - copied = -EFAULT;
> + if (last_copied_dmabuf != -1 &&
> + last_copied_dmabuf != !skb_frags_readable(skb))
> break;
> +
> + if (skb_frags_readable(skb)) {
> + err = skb_copy_datagram_msg(skb, offset, msg,
> + used);
> + if (err) {
> + /* Exception. Bailout! */
> + if (!copied)
> + copied = -EFAULT;
> + break;
> + }
> + } else {
> + if (!(flags & MSG_SOCK_DEVMEM)) {
> + /* dmabuf skbs can only be received
> + * with the MSG_SOCK_DEVMEM flag.
> + */
> + if (!copied)
> + copied = -EFAULT;
> +
> + break;
> + }
> +
> + err = tcp_recvmsg_dmabuf(sk, skb, offset, msg,
> + used);
> + if (err <= 0) {
> + if (!copied)
> + copied = -EFAULT;
> +
> + break;
> + }
> + used = err;
> }
> }
>
> + last_copied_dmabuf = !skb_frags_readable(skb);
> +
> WRITE_ONCE(*seq, *seq + used);
> copied += used;
> len -= used;
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index fd17f25ff288..f3b2ae0823c4 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -79,6 +79,7 @@
> #include <linux/seq_file.h>
> #include <linux/inetdevice.h>
> #include <linux/btf_ids.h>
> +#include <linux/skbuff_ref.h>
>
> #include <crypto/hash.h>
> #include <linux/scatterlist.h>
> @@ -2507,10 +2508,25 @@ static void tcp_md5sig_info_free_rcu(struct rcu_head *head)
> }
> #endif
>
> +static void tcp_release_user_frags(struct sock *sk)
> +{
> +#ifdef CONFIG_PAGE_POOL
> + unsigned long index;
> + void *netmem;
> +
> + xa_for_each(&sk->sk_user_frags, index, netmem)
> + WARN_ON_ONCE(!napi_pp_put_page((__force netmem_ref)netmem));
> +#endif
> +}
> +
> void tcp_v4_destroy_sock(struct sock *sk)
> {
> struct tcp_sock *tp = tcp_sk(sk);
>
> + tcp_release_user_frags(sk);
> +
> + xa_destroy(&sk->sk_user_frags);
> +
> trace_tcp_destroy_sock(sk);
>
> tcp_clear_xmit_timers(sk);
> diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
> index a19a9dbd3409..9ab87a41255d 100644
> --- a/net/ipv4/tcp_minisocks.c
> +++ b/net/ipv4/tcp_minisocks.c
> @@ -625,6 +625,8 @@ struct sock *tcp_create_openreq_child(const struct sock *sk,
>
> __TCP_INC_STATS(sock_net(sk), TCP_MIB_PASSIVEOPENS);
>
> + xa_init_flags(&newsk->sk_user_frags, XA_FLAGS_ALLOC1);
> +
> return newsk;
> }
> EXPORT_SYMBOL(tcp_create_openreq_child);
> --
> 2.46.0.76.ge559c4bf1a-goog
>
I have been testing with modified code like below, it has been working
correctly for 24+ hours.
This modification is only for simple testing.
So, could you please look into this problem?
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 40e7335dae6e..b9df6ac28477 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2396,9 +2396,11 @@ static int tcp_recvmsg_dmabuf(struct sock *sk,
const struct sk_buff *skb,
struct dmabuf_cmsg dmabuf_cmsg = { 0 };
struct tcp_xa_pool tcp_xa_pool;
unsigned int start;
+ int skip = offset;
int i, copy, n;
int sent = 0;
int err = 0;
+ int end = 0;
tcp_xa_pool.max = 0;
tcp_xa_pool.idx = 0;
@@ -2452,7 +2454,6 @@ static int tcp_recvmsg_dmabuf(struct sock *sk,
const struct sk_buff *skb,
skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
struct net_iov *niov;
u64 frag_offset;
- int end;
/* !skb_frags_readable() should indicate that ALL the
* frags in this skb are dmabuf net_iovs. We're checking
@@ -2522,7 +2523,14 @@ static int tcp_recvmsg_dmabuf(struct sock *sk,
const struct sk_buff *skb,
*/
skb = skb_shinfo(skb)->frag_list ?: skb->next;
- offset = 0;
+ if (skip > 0) {
+ skip -= end;
+ offset = skip;
+ }
+ if (skip <= 0) {
+ offset = 0;
+ skip = 0;
+ }
} while (skb);
if (remaining_len) {
Thanks a lot!
Taehee Yoo
Powered by blists - more mailing lists