linux-kernel - Re: [v3 net-next 08/10] skbuff: reuse NAPI skb cache on allocation path (__build

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20210210122414.8064-1-alobakin@pm.me>
Date:   Wed, 10 Feb 2021 12:25:04 +0000
From:   Alexander Lobakin <alobakin@...me>
To:     Paolo Abeni <pabeni@...hat.com>
Cc:     Alexander Lobakin <alobakin@...me>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Jonathan Lemon <jonathan.lemon@...il.com>,
        Eric Dumazet <edumazet@...gle.com>,
        Dmitry Vyukov <dvyukov@...gle.com>,
        Willem de Bruijn <willemb@...gle.com>,
        Randy Dunlap <rdunlap@...radead.org>,
        Kevin Hao <haokexin@...il.com>,
        Pablo Neira Ayuso <pablo@...filter.org>,
        Jakub Sitnicki <jakub@...udflare.com>,
        Marco Elver <elver@...gle.com>,
        Dexuan Cui <decui@...rosoft.com>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Andrii Nakryiko <andriin@...com>,
        Taehee Yoo <ap420073@...il.com>,
        Cong Wang <xiyou.wangcong@...il.com>,
        Björn Töpel <bjorn@...nel.org>,
        Miaohe Lin <linmiaohe@...wei.com>,
        Guillaume Nault <gnault@...hat.com>,
        Yonghong Song <yhs@...com>, zhudi <zhudi21@...wei.com>,
        Michal Kubecek <mkubecek@...e.cz>,
        Marcelo Ricardo Leitner <marcelo.leitner@...il.com>,
        Dmitry Safonov <0x7f454c46@...il.com>,
        Yang Yingliang <yangyingliang@...wei.com>,
        linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
        Edward Cree <ecree.xilinx@...il.com>,
        Jesper Dangaard Brouer <brouer@...hat.com>
Subject: Re: [v3 net-next 08/10] skbuff: reuse NAPI skb cache on allocation path (__build_skb())

From: Paolo Abeni <pabeni@...hat.com>
Date: Wed, 10 Feb 2021 11:21:06 +0100

> Hello,

Hi!
 
> I'm sorry for the late feedback, I could not step-in before.
> 
> Also adding Jesper for awareness, as he introduced the bulk free
> infrastructure.
> 
> On Tue, 2021-02-09 at 20:48 +0000, Alexander Lobakin wrote:
> > @@ -231,7 +256,7 @@ struct sk_buff *__build_skb(void *data, unsigned int frag_size)
> >   */
> >  struct sk_buff *build_skb(void *data, unsigned int frag_size)
> >  {
> > -	struct sk_buff *skb = __build_skb(data, frag_size);
> > +	struct sk_buff *skb = __build_skb(data, frag_size, true);
> 
> I must admit I'm a bit scared of this. There are several high speed
> device drivers that will move to bulk allocation, and we don't have any
> performance figure for them.
> 
> In my experience with (low end) MIPS board, cache misses cost tend to
> be much less visible there compared to reasonably recent server H/W,
> because the CPU/memory access time difference is much lower.
> 
> When moving to higher end H/W the performance gain you measured could
> be completely countered by less optimal cache usage.
> 
> I fear also latency spikes - I'm unsure if a 32 skbs allocation vs a
> single skb would be visible e.g. in a round-robin test. Generally
> speaking bulk allocating 32 skbs looks a bit too much. IIRC, when
> Edward added listification to GRO, he did several measures with
> different list size and found 8 to be the optimal value (for the tested
> workload). Above such number the list become too big and the pressure
> on the cache outweighted the bulking benefits.

I can change to logics the way so it would allocate the first 8.
I think I've already seen this batch value somewhere in XDP code,
so this might be a balanced one.

Regarding bulk-freeing: can the batch size make sense when freeing
or it's okay to wipe 32 (currently 64 in baseline) in a row?

> Perhaps giving the device drivers the ability to opt-in on this infra
> via a new helper - as done back then with napi_consume_skb() - would
> make this change safer?

That's actually a very nice idea. There's only a little in the code
to change to introduce an ability to take heads from the cache
optionally. This way developers could switch to it when needed.

Thanks for the suggestions! I'll definitely absorb them into the code
and give it a test.

> > @@ -838,31 +863,31 @@ void __consume_stateless_skb(struct sk_buff *skb)
> >  	kfree_skbmem(skb);
> >  }
> >
> > -static inline void _kfree_skb_defer(struct sk_buff *skb)
> > +static void napi_skb_cache_put(struct sk_buff *skb)
> >  {
> >  	struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache);
> > +	u32 i;
> >
> >  	/* drop skb->head and call any destructors for packet */
> >  	skb_release_all(skb);
> >
> > -	/* record skb to CPU local list */
> > +	kasan_poison_object_data(skbuff_head_cache, skb);
> >  	nc->skb_cache[nc->skb_count++] = skb;
> >
> > -#ifdef CONFIG_SLUB
> > -	/* SLUB writes into objects when freeing */
> > -	prefetchw(skb);
> > -#endif
> 
> It looks like this chunk has been lost. Is that intentional?

Yep. This prefetchw() assumed that skbuff_heads will be wiped
immediately or at the end of network softirq. Reusing this cache
means that heads can be reused later or may be kept in a cache for
some time, so prefetching makes no sense anymore.

> Thanks!
> 
> Paolo

Al