netdev - Re: [PATCH net-next 2/2] net: kfree_skb_list use kmem_cache_free

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <fa1c57de-52f6-719f-7298-c606c119d1ab@redhat.com>
Date:   Mon, 9 Jan 2023 13:24:54 +0100
From:   Jesper Dangaard Brouer <jbrouer@...hat.com>
To:     Jakub Kicinski <kuba@...nel.org>
Cc:     brouer@...hat.com, netdev@...r.kernel.org,
        "David S. Miller" <davem@...emloft.net>, edumazet@...gle.com,
        pabeni@...hat.com
Subject: Re: [PATCH net-next 2/2] net: kfree_skb_list use kmem_cache_free_bulk

On 06/01/2023 23.33, Jakub Kicinski wrote:
> Hi!
> 
> Would it not be better to try to actually defer them (queue to
> the deferred free list and try to ship back to the NAPI cache of
> the allocating core)? 
> Is the spin lock on the defer list problematic
> for fowarding cases (which I'm assuming your target)?

We might be talking past each-other.  As the NAPI cache for me
is the per CPU napi_alloc_cache (this_cpu_ptr(&napi_alloc_cache);)

This napi_alloc_cache doesn't use a spin_lock, but depend on being
protected by NAPI context.  The code in this patch closely resembles how
the napi_alloc_cache works.  See code: napi_consume_skb() and
__kfree_skb_defer().

> Also the lack of perf numbers is a bit of a red flag.
>

I have run performance tests, but as I tried to explain in the
cover letter, for the qdisc use-case this code path is only activated
when we have overflow at enqueue.  Thus, this doesn't translate directly
into a performance numbers, as TX-qdisc is 100% full caused by hardware
device being backed up, and this patch makes us use less time on freeing
memory.

I have been using pktgen script ./pktgen_bench_xmit_mode_queue_xmit.sh
which can inject packets at the qdisc layer (invoking __dev_queue_xmit).
And then used perf-record to see overhead of SLUB (__slab_free is top#4)
is reduced.

> On Thu, 05 Jan 2023 16:42:47 +0100 Jesper Dangaard Brouer wrote:
>> +static void kfree_skb_defer_local(struct sk_buff *skb,
>> +				  struct skb_free_array *sa,
>> +				  enum skb_drop_reason reason)
> 
> If we wanna keep the implementation as is - I think we should rename
> the thing to say "bulk" rather than "defer" to avoid confusion with
> the TCP's "defer to allocating core" scheme..

I named it "defer" because the NAPI cache uses "defer" specifically func
name __kfree_skb_defer() why I choose kfree_skb_defer_local(), as this
patch uses similar scheme.

I'm not sure what is meant by 'TCP's "defer to allocating core" scheme'.
Looking at code I guess you are referring to skb_attempt_defer_free()
and skb_defer_free_flush().

It would be too high cost calling skb_attempt_defer_free() for every SKB
because of the expensive spin_lock_irqsave() (+ restore).  I see the
skb_defer_free_flush() can be improved to use spin_lock_irq() (avoiding
mangling CPU flags).  And skb_defer_free_flush() (which gets called from
RX-NAPI/net_rx_action) end up calling napi_consume_skb() that endup
calling kmem_cache_free_bulk() (which I also do, just more directly).

> 
> kfree_skb_list_bulk() ?

Hmm, IMHO not really worth changing the function name.  The
kfree_skb_list() is called in more places, (than qdisc enqueue-overflow
case), which automatically benefits if we keep the function name
kfree_skb_list().

--Jesper