[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250627085745.53173-2-kerneljasonxing@gmail.com>
Date: Fri, 27 Jun 2025 16:57:44 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: davem@...emloft.net,
edumazet@...gle.com,
kuba@...nel.org,
pabeni@...hat.com,
bjorn@...nel.org,
magnus.karlsson@...el.com,
maciej.fijalkowski@...el.com,
jonathan.lemon@...il.com,
sdf@...ichev.me,
ast@...nel.org,
daniel@...earbox.net,
hawk@...nel.org,
john.fastabend@...il.com,
joe@...a.to,
willemdebruijn.kernel@...il.com
Cc: bpf@...r.kernel.org,
netdev@...r.kernel.org,
Jason Xing <kernelxing@...cent.com>
Subject: [PATCH net-next v4 1/2] net: xsk: update tx queue consumer immediately after transmission
From: Jason Xing <kernelxing@...cent.com>
For afxdp, the return value of sendto() syscall doesn't reflect how many
descs handled in the kernel. One of use cases is that when user-space
application tries to know the number of transmitted skbs and then decides
if it continues to send, say, is it stopped due to max tx budget?
The following formular can be used after sending to learn how many
skbs/descs the kernel takes care of:
tx_queue.consumers_before - tx_queue.consumers_after
Prior to the current patch, in non-zc mode, the consumer of tx queue is
not immediately updated at the end of each sendto syscall when error
occurs, which leads to the consumer value out-of-dated from the perspective
of user space. So this patch requires store operation to pass the cached
value to the shared value to handle the problem.
More than those explicit errors appearing in the while() loop in
__xsk_generic_xmit(), there are a few possible error cases that might
be neglected in the following call trace:
__xsk_generic_xmit()
xskq_cons_peek_desc()
xskq_cons_read_desc()
xskq_cons_is_valid_desc()
It will also cause the premature exit in the while() loop even if not
all the descs are consumed.
Based on the above analysis, using @sent_frame could cover all the possible
cases where it might lead to out-of-dated global state of consumer after
finishing __xsk_generic_xmit().
The patch also adds a common helper __xsk_tx_release() to keep align
with the zc mode usage in xsk_tx_release().
Signed-off-by: Jason Xing <kernelxing@...cent.com>
---
v4
Link: https://lore.kernel.org/all/20250625101014.45066-1-kerneljasonxing@gmail.com/
1. use the common helper
2. keep align with the zc mode usage in xsk_tx_release()
v3
Link: https://lore.kernel.org/all/20250623073129.23290-1-kerneljasonxing@gmail.com/
1. use xskq_has_descs helper.
2. add selftest
V2
Link: https://lore.kernel.org/all/20250619093641.70700-1-kerneljasonxing@gmail.com/
1. filter out those good cases because only those that return error need
updates.
Side note:
1. in non-batched zero copy mode, at the end of every caller of
xsk_tx_peek_desc(), there is always a xsk_tx_release() function that used
to update the local consumer to the global state of consumer. So for the
zero copy mode, no need to change at all.
2. Actually I have no strong preference between v1 (see the above link)
and v2 because smp_store_release() shouldn't cause side effect.
Considering the exactitude of writing code, v2 is a more preferable
one.
---
net/xdp/xsk.c | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 72c000c0ae5f..bd61b0bc9c24 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -300,6 +300,13 @@ static bool xsk_tx_writeable(struct xdp_sock *xs)
return true;
}
+static void __xsk_tx_release(struct xdp_sock *xs)
+{
+ __xskq_cons_release(xs->tx);
+ if (xsk_tx_writeable(xs))
+ xs->sk.sk_write_space(&xs->sk);
+}
+
static bool xsk_is_bound(struct xdp_sock *xs)
{
if (READ_ONCE(xs->state) == XSK_BOUND) {
@@ -407,11 +414,8 @@ void xsk_tx_release(struct xsk_buff_pool *pool)
struct xdp_sock *xs;
rcu_read_lock();
- list_for_each_entry_rcu(xs, &pool->xsk_tx_list, tx_list) {
- __xskq_cons_release(xs->tx);
- if (xsk_tx_writeable(xs))
- xs->sk.sk_write_space(&xs->sk);
- }
+ list_for_each_entry_rcu(xs, &pool->xsk_tx_list, tx_list)
+ __xsk_tx_release(xs);
rcu_read_unlock();
}
EXPORT_SYMBOL(xsk_tx_release);
@@ -858,8 +862,7 @@ static int __xsk_generic_xmit(struct sock *sk)
out:
if (sent_frame)
- if (xsk_tx_writeable(xs))
- sk->sk_write_space(sk);
+ __xsk_tx_release(xs);
mutex_unlock(&xs->mutex);
return err;
--
2.41.3
Powered by blists - more mailing lists