[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251008104612.1824200-3-edumazet@google.com>
Date: Wed, 8 Oct 2025 10:46:09 +0000
From: Eric Dumazet <edumazet@...gle.com>
To: "David S . Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>
Cc: Simon Horman <horms@...nel.org>, Kuniyuki Iwashima <kuniyu@...gle.com>,
Willem de Bruijn <willemb@...gle.com>, netdev@...r.kernel.org, eric.dumazet@...il.com,
Eric Dumazet <edumazet@...gle.com>
Subject: [PATCH RFC net-next 2/4] net: control skb->ooo_okay from skb_set_owner_w()
15 years after Tom Herbert added skb->ooo_okay, only TCP transport
benefits from it.
We can support other transports directly from skb_set_owner_w().
If no other TX packet for this socket is in a host queue (qdisc, NIC queue)
there is no risk of self-inflicted reordering, we can set skb->ooo_okay.
This allows netdev_pick_tx() to choose a TX queue based on XPS settings,
instead of reusing the queue chosen at the time the first packet was sent
for connected sockets.
Tested:
500 concurrent UDP_RR connected UDP flows, host with 32 TX queues, XPS setup.
super_netperf 500 -t UDP_RR -H <host> -l 1000 -- -r 100,100 -Nn &
This patch saves between 10% and 20% of cycles, depending on how
process scheduler migrates threads among cpus.
Using following bpftrace script, we can see the effect on Qdisc/NIC tx queues
being better used (less cache line misses).
bpftrace -e '
k:__dev_queue_xmit { @start[cpu] = nsecs; }
kr:__dev_queue_xmit {
if (@start[cpu]) {
$delay = nsecs - @start[cpu];
delete(@start[cpu]);
@__dev_queue_xmit_ns = hist($delay);
}
}
END { clear(@start); }'
Before:
@__dev_queue_xmit_ns:
[128, 256) 6 | |
[256, 512) 116283 | |
[512, 1K) 1888205 |@@@@@@@@@@@ |
[1K, 2K) 8106167 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[2K, 4K) 8699293 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[4K, 8K) 2600676 |@@@@@@@@@@@@@@@ |
[8K, 16K) 721688 |@@@@ |
[16K, 32K) 122995 | |
[32K, 64K) 10639 | |
[64K, 128K) 119 | |
[128K, 256K) 1 | |
After:
@__dev_queue_xmit_ns:
[128, 256) 3 | |
[256, 512) 651112 |@@ |
[512, 1K) 8109938 |@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[1K, 2K) 16081031 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[2K, 4K) 2411692 |@@@@@@@ |
[4K, 8K) 98994 | |
[8K, 16K) 1536 | |
[16K, 32K) 587 | |
[32K, 64K) 2 | |
Signed-off-by: Eric Dumazet <edumazet@...gle.com>
---
net/core/sock.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/net/core/sock.c b/net/core/sock.c
index 542cfa16ee125f6c8487237c9040695d42794087..08ae20069b6d287745800710192396f76c8781b4 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2694,6 +2694,8 @@ void __sock_wfree(struct sk_buff *skb)
void skb_set_owner_w(struct sk_buff *skb, struct sock *sk)
{
+ int old_wmem;
+
skb_orphan(skb);
#ifdef CONFIG_INET
if (unlikely(!sk_fullsock(sk)))
@@ -2707,7 +2709,15 @@ void skb_set_owner_w(struct sk_buff *skb, struct sock *sk)
* is enough to guarantee sk_free() won't free this sock until
* all in-flight packets are completed
*/
- refcount_add(skb->truesize, &sk->sk_wmem_alloc);
+ __refcount_add(skb->truesize, &sk->sk_wmem_alloc, &old_wmem);
+
+ /* (old_wmem == SK_WMEM_ALLOC_BIAS) if no other TX packet for this socket
+ * is in a host queue (qdisc, NIC queue).
+ * Set skb->ooo_okay so that netdev_pick_tx() can choose a TX queue
+ * based on XPS for better performance.
+ * Otherwise clear ooo_okay to not risk Out Of Order delivery.
+ */
+ skb->ooo_okay = (old_wmem == SK_WMEM_ALLOC_BIAS);
}
EXPORT_SYMBOL(skb_set_owner_w);
--
2.51.0.710.ga91ca5db03-goog
Powered by blists - more mailing lists