[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iJwTydUJG4docxfc0soY98BU7=g-nh+ZAvRi6qD5Bt_Ow@mail.gmail.com>
Date: Thu, 6 Nov 2025 01:13:24 -0800
From: Eric Dumazet <edumazet@...gle.com>
To: Paolo Abeni <pabeni@...hat.com>
Cc: "David S . Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>,
Simon Horman <horms@...nel.org>, netdev@...r.kernel.org, eric.dumazet@...il.com
Subject: Re: [PATCH net-next] net: add prefetch() in skb_defer_free_flush()
On Thu, Nov 6, 2025 at 1:05 AM Paolo Abeni <pabeni@...hat.com> wrote:
>
> On 11/6/25 9:55 AM, Eric Dumazet wrote:
> > skb_defer_free_flush() is becoming more important these days.
> >
> > Add a prefetch operation to reduce latency a bit on some
> > platforms like AMD EPYC 7B12.
> >
> > On more recent cpus, a stall happens when reading skb_shinfo().
> > Avoiding it will require a more elaborate strategy.
>
> For my education, how do you catch such stalls? looking for specific
> perf events? Or just based on cycles spent in a given function/chunk of
> code?
In this case, I was focusing on a NIC driver handling both RX and TX
from a single cpu.
I am using "perf record -g -C one_of_the_hot_cpu sleep 5; perf report
--no-children"
I am working on an issue with napi_complete_skb() which has no NUMA awareness.
With the following WIP series, I can push 115 Mpps UDP packets
(instead of 80Mpps) on IDPF.
I need more tests before pushing it for review, but the prefetch()
from skb_defer_free_flush()
is a no-brainer.
git diff d24e4780d5783b8eecd33aab03bd4efd24703c65..
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 5b4bc8b1c7d5674c19b64f8b15685d74632048fe..7ac5f8aa1235a55db02b40b5a0f51bb3fa53fa03
100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1149,11 +1149,10 @@ void skb_release_head_state(struct sk_buff *skb)
skb);
#endif
+ skb->destructor = NULL;
}
-#if IS_ENABLED(CONFIG_NF_CONNTRACK)
- nf_conntrack_put(skb_nfct(skb));
-#endif
- skb_ext_put(skb);
+ nf_reset_ct(skb);
+ skb_ext_reset(skb);
}
/* Free everything but the sk_buff shell. */
@@ -1477,6 +1476,11 @@ void napi_consume_skb(struct sk_buff *skb, int budget)
DEBUG_NET_WARN_ON_ONCE(!in_softirq());
+ if (skb->alloc_cpu != smp_processor_id() && !skb_shared(skb)) {
+ skb_release_head_state(skb);
+ return skb_attempt_defer_free(skb);
+ }
+
if (!skb_unref(skb))
return;
commit df7dacc619117ebab7ea330ccc6390618f04dff3
Author: Eric Dumazet <edumazet@...gle.com>
Date: Wed Nov 5 17:02:20 2025 +0000
net: fix napi_consume_skb() with alien skbs
There is a lack of NUMA awareness and more generally lack
of slab caches affinity on TX completion path.
Modern drivers are using napi_consume_skb(), hoping to cache sk_buff
in per-cpu caches so that they can be recycled in RX path.
Only allow this if the skb was allocated on the same cpu,
otherwise use skb_attempt_defer_free() so that the skb
is freed on the original cpu.
This removes contention on SLUB spinlocks and data structures.
After this patch, I get 40% improvement for an UDP tx workload
on an AMD EPYC 9B45 (IDPF 200Gbit NIC with 32 TX queues).
80 Mpps -> 115 Mpps.
Signed-off-by: Eric Dumazet <edumazet@...gle.com>
commit 42593ad5f2bed6abd3a6cce3483e2980b114cbd9
Author: Eric Dumazet <edumazet@...gle.com>
Date: Wed Nov 5 16:50:29 2025 +0000
net: allow skb_release_head_state() to be called multiple times
Currently, only skb dst is cleared (thanks to skb_dst_drop())
Make sure skb->destructor, conntrack and extensions are cleared.
Signed-off-by: Eric Dumazet <edumazet@...gle.com>
Powered by blists - more mailing lists