netdev - Re: [PATCH net-next] net: add prefetch() in skb_defer_free

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89iJwTydUJG4docxfc0soY98BU7=g-nh+ZAvRi6qD5Bt_Ow@mail.gmail.com>
Date: Thu, 6 Nov 2025 01:13:24 -0800
From: Eric Dumazet <edumazet@...gle.com>
To: Paolo Abeni <pabeni@...hat.com>
Cc: "David S . Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>, 
	Simon Horman <horms@...nel.org>, netdev@...r.kernel.org, eric.dumazet@...il.com
Subject: Re: [PATCH net-next] net: add prefetch() in skb_defer_free_flush()

On Thu, Nov 6, 2025 at 1:05 AM Paolo Abeni <pabeni@...hat.com> wrote:
>
> On 11/6/25 9:55 AM, Eric Dumazet wrote:
> > skb_defer_free_flush() is becoming more important these days.
> >
> > Add a prefetch operation to reduce latency a bit on some
> > platforms like AMD EPYC 7B12.
> >
> > On more recent cpus, a stall happens when reading skb_shinfo().
> > Avoiding it will require a more elaborate strategy.
>
> For my education, how do you catch such stalls? looking for specific
> perf events? Or just based on cycles spent in a given function/chunk of
> code?

In this case, I was focusing on a NIC driver handling both RX and TX
from a single cpu.

I am using "perf record -g -C one_of_the_hot_cpu sleep 5; perf report
--no-children"

I am working on an issue with napi_complete_skb() which has no NUMA awareness.

With the following WIP series, I can push 115 Mpps UDP packets
(instead of 80Mpps) on IDPF.
I need more tests before pushing it for review, but the prefetch()
from skb_defer_free_flush()
is a no-brainer.


git diff d24e4780d5783b8eecd33aab03bd4efd24703c65..
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 5b4bc8b1c7d5674c19b64f8b15685d74632048fe..7ac5f8aa1235a55db02b40b5a0f51bb3fa53fa03
100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1149,11 +1149,10 @@ void skb_release_head_state(struct sk_buff *skb)
                                skb);

 #endif
+               skb->destructor = NULL;
        }
-#if IS_ENABLED(CONFIG_NF_CONNTRACK)
-       nf_conntrack_put(skb_nfct(skb));
-#endif
-       skb_ext_put(skb);
+       nf_reset_ct(skb);
+       skb_ext_reset(skb);
 }

 /* Free everything but the sk_buff shell. */
@@ -1477,6 +1476,11 @@ void napi_consume_skb(struct sk_buff *skb, int budget)

        DEBUG_NET_WARN_ON_ONCE(!in_softirq());

+       if (skb->alloc_cpu != smp_processor_id() && !skb_shared(skb)) {
+               skb_release_head_state(skb);
+               return skb_attempt_defer_free(skb);
+       }
+
        if (!skb_unref(skb))
                return;



commit df7dacc619117ebab7ea330ccc6390618f04dff3
Author: Eric Dumazet <edumazet@...gle.com>
Date:   Wed Nov 5 17:02:20 2025 +0000

    net: fix napi_consume_skb() with alien skbs

    There is a lack of NUMA awareness and more generally lack
    of slab caches affinity on TX completion path.

    Modern drivers are using napi_consume_skb(), hoping to cache sk_buff
    in per-cpu caches so that they can be recycled in RX path.

    Only allow this if the skb was allocated on the same cpu,
    otherwise use skb_attempt_defer_free() so that the skb
    is freed on the original cpu.

    This removes contention on SLUB spinlocks and data structures.

    After this patch, I get 40% improvement for an UDP tx workload
    on an AMD EPYC 9B45 (IDPF 200Gbit NIC with 32 TX queues).

    80 Mpps -> 115 Mpps.

    Signed-off-by: Eric Dumazet <edumazet@...gle.com>

commit 42593ad5f2bed6abd3a6cce3483e2980b114cbd9
Author: Eric Dumazet <edumazet@...gle.com>
Date:   Wed Nov 5 16:50:29 2025 +0000

    net: allow skb_release_head_state() to be called multiple times

    Currently, only skb dst is cleared (thanks to skb_dst_drop())

    Make sure skb->destructor, conntrack and extensions are cleared.

    Signed-off-by: Eric Dumazet <edumazet@...gle.com>