linux-kernel - Re: [v3 net-next 08/10] skbuff: reuse NAPI skb cache on allocation path (__build

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b6efe8d3a4ebf8188c040c5401b50b6c11b6eaf9.camel@redhat.com>
Date:   Wed, 10 Feb 2021 11:21:06 +0100
From:   Paolo Abeni <pabeni@...hat.com>
To:     Alexander Lobakin <alobakin@...me>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>
Cc:     Jonathan Lemon <jonathan.lemon@...il.com>,
        Eric Dumazet <edumazet@...gle.com>,
        Dmitry Vyukov <dvyukov@...gle.com>,
        Willem de Bruijn <willemb@...gle.com>,
        Randy Dunlap <rdunlap@...radead.org>,
        Kevin Hao <haokexin@...il.com>,
        Pablo Neira Ayuso <pablo@...filter.org>,
        Jakub Sitnicki <jakub@...udflare.com>,
        Marco Elver <elver@...gle.com>,
        Dexuan Cui <decui@...rosoft.com>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Andrii Nakryiko <andriin@...com>,
        Taehee Yoo <ap420073@...il.com>,
        Cong Wang <xiyou.wangcong@...il.com>,
        Björn Töpel <bjorn@...nel.org>,
        Miaohe Lin <linmiaohe@...wei.com>,
        Guillaume Nault <gnault@...hat.com>,
        Yonghong Song <yhs@...com>, zhudi <zhudi21@...wei.com>,
        Michal Kubecek <mkubecek@...e.cz>,
        Marcelo Ricardo Leitner <marcelo.leitner@...il.com>,
        Dmitry Safonov <0x7f454c46@...il.com>,
        Yang Yingliang <yangyingliang@...wei.com>,
        linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
        Edward Cree <ecree.xilinx@...il.com>,
        Jesper Dangaard Brouer <brouer@...hat.com>
Subject: Re: [v3 net-next 08/10] skbuff: reuse NAPI skb cache on allocation
 path (__build_skb())

Hello,

I'm sorry for the late feedback, I could not step-in before.

Also adding Jesper for awareness, as he introduced the bulk free
infrastructure.

On Tue, 2021-02-09 at 20:48 +0000, Alexander Lobakin wrote:
> @@ -231,7 +256,7 @@ struct sk_buff *__build_skb(void *data, unsigned int frag_size)
>   */
>  struct sk_buff *build_skb(void *data, unsigned int frag_size)
>  {
> -	struct sk_buff *skb = __build_skb(data, frag_size);
> +	struct sk_buff *skb = __build_skb(data, frag_size, true);

I must admit I'm a bit scared of this. There are several high speed
device drivers that will move to bulk allocation, and we don't have any
performance figure for them.

In my experience with (low end) MIPS board, cache misses cost tend to
be much less visible there compared to reasonably recent server H/W,
because the CPU/memory access time difference is much lower.

When moving to higher end H/W the performance gain you measured could
be completely countered by less optimal cache usage.

I fear also latency spikes - I'm unsure if a 32 skbs allocation vs a
single skb would be visible e.g. in a round-robin test. Generally
speaking bulk allocating 32 skbs looks a bit too much. IIRC, when
Edward added listification to GRO, he did several measures with
different list size and found 8 to be the optimal value (for the tested
workload). Above such number the list become too big and the pressure
on the cache outweighted the bulking benefits.

Perhaps giving the device drivers the ability to opt-in on this infra
via a new helper - as done back then with napi_consume_skb() - would
make this change safer?

> @@ -838,31 +863,31 @@ void __consume_stateless_skb(struct sk_buff *skb)
>  	kfree_skbmem(skb);
>  }
>  
> -static inline void _kfree_skb_defer(struct sk_buff *skb)
> +static void napi_skb_cache_put(struct sk_buff *skb)
>  {
>  	struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache);
> +	u32 i;
>  
>  	/* drop skb->head and call any destructors for packet */
>  	skb_release_all(skb);
>  
> -	/* record skb to CPU local list */
> +	kasan_poison_object_data(skbuff_head_cache, skb);
>  	nc->skb_cache[nc->skb_count++] = skb;
>  
> -#ifdef CONFIG_SLUB
> -	/* SLUB writes into objects when freeing */
> -	prefetchw(skb);
> -#endif

It looks like this chunk has been lost. Is that intentional?

Thanks!

Paolo