[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fa1c57de-52f6-719f-7298-c606c119d1ab@redhat.com>
Date: Mon, 9 Jan 2023 13:24:54 +0100
From: Jesper Dangaard Brouer <jbrouer@...hat.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: brouer@...hat.com, netdev@...r.kernel.org,
"David S. Miller" <davem@...emloft.net>, edumazet@...gle.com,
pabeni@...hat.com
Subject: Re: [PATCH net-next 2/2] net: kfree_skb_list use kmem_cache_free_bulk
On 06/01/2023 23.33, Jakub Kicinski wrote:
> Hi!
>
> Would it not be better to try to actually defer them (queue to
> the deferred free list and try to ship back to the NAPI cache of
> the allocating core)?
> Is the spin lock on the defer list problematic
> for fowarding cases (which I'm assuming your target)?
We might be talking past each-other. As the NAPI cache for me
is the per CPU napi_alloc_cache (this_cpu_ptr(&napi_alloc_cache);)
This napi_alloc_cache doesn't use a spin_lock, but depend on being
protected by NAPI context. The code in this patch closely resembles how
the napi_alloc_cache works. See code: napi_consume_skb() and
__kfree_skb_defer().
> Also the lack of perf numbers is a bit of a red flag.
>
I have run performance tests, but as I tried to explain in the
cover letter, for the qdisc use-case this code path is only activated
when we have overflow at enqueue. Thus, this doesn't translate directly
into a performance numbers, as TX-qdisc is 100% full caused by hardware
device being backed up, and this patch makes us use less time on freeing
memory.
I have been using pktgen script ./pktgen_bench_xmit_mode_queue_xmit.sh
which can inject packets at the qdisc layer (invoking __dev_queue_xmit).
And then used perf-record to see overhead of SLUB (__slab_free is top#4)
is reduced.
> On Thu, 05 Jan 2023 16:42:47 +0100 Jesper Dangaard Brouer wrote:
>> +static void kfree_skb_defer_local(struct sk_buff *skb,
>> + struct skb_free_array *sa,
>> + enum skb_drop_reason reason)
>
> If we wanna keep the implementation as is - I think we should rename
> the thing to say "bulk" rather than "defer" to avoid confusion with
> the TCP's "defer to allocating core" scheme..
I named it "defer" because the NAPI cache uses "defer" specifically func
name __kfree_skb_defer() why I choose kfree_skb_defer_local(), as this
patch uses similar scheme.
I'm not sure what is meant by 'TCP's "defer to allocating core" scheme'.
Looking at code I guess you are referring to skb_attempt_defer_free()
and skb_defer_free_flush().
It would be too high cost calling skb_attempt_defer_free() for every SKB
because of the expensive spin_lock_irqsave() (+ restore). I see the
skb_defer_free_flush() can be improved to use spin_lock_irq() (avoiding
mangling CPU flags). And skb_defer_free_flush() (which gets called from
RX-NAPI/net_rx_action) end up calling napi_consume_skb() that endup
calling kmem_cache_free_bulk() (which I also do, just more directly).
>
> kfree_skb_list_bulk() ?
Hmm, IMHO not really worth changing the function name. The
kfree_skb_list() is called in more places, (than qdisc enqueue-overflow
case), which automatically benefits if we keep the function name
kfree_skb_list().
--Jesper
Powered by blists - more mailing lists