linux-kernel - Re: [PATCH v4 net-next 09/11] skbuff: allow to optionally use NAPI cache from __alloc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20210211152620.3339-1-alobakin@pm.me>
Date:   Thu, 11 Feb 2021 15:26:46 +0000
From:   Alexander Lobakin <alobakin@...me>
To:     Paolo Abeni <pabeni@...hat.com>
Cc:     Alexander Lobakin <alobakin@...me>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Jonathan Lemon <jonathan.lemon@...il.com>,
        Eric Dumazet <edumazet@...gle.com>,
        Dmitry Vyukov <dvyukov@...gle.com>,
        Willem de Bruijn <willemb@...gle.com>,
        Randy Dunlap <rdunlap@...radead.org>,
        Kevin Hao <haokexin@...il.com>,
        Pablo Neira Ayuso <pablo@...filter.org>,
        Jakub Sitnicki <jakub@...udflare.com>,
        Marco Elver <elver@...gle.com>,
        Dexuan Cui <decui@...rosoft.com>,
        Jesper Dangaard Brouer <brouer@...hat.com>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Andrii Nakryiko <andriin@...com>,
        Taehee Yoo <ap420073@...il.com>,
        Cong Wang <xiyou.wangcong@...il.com>,
        Björn Töpel <bjorn@...nel.org>,
        Miaohe Lin <linmiaohe@...wei.com>,
        Guillaume Nault <gnault@...hat.com>,
        Yonghong Song <yhs@...com>, zhudi <zhudi21@...wei.com>,
        Michal Kubecek <mkubecek@...e.cz>,
        Marcelo Ricardo Leitner <marcelo.leitner@...il.com>,
        Dmitry Safonov <0x7f454c46@...il.com>,
        Yang Yingliang <yangyingliang@...wei.com>,
        Florian Westphal <fw@...len.de>,
        Edward Cree <ecree.xilinx@...il.com>,
        linux-kernel@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [PATCH v4 net-next 09/11] skbuff: allow to optionally use NAPI cache from __alloc_skb()

From: Paolo Abeni <pabeni@...hat.com>
Date: Thu, 11 Feb 2021 15:55:04 +0100

> On Thu, 2021-02-11 at 14:28 +0000, Alexander Lobakin wrote:
> > From: Paolo Abeni <pabeni@...hat.com> on Thu, 11 Feb 2021 11:16:40 +0100 wrote:
> > > What about changing __napi_alloc_skb() to always use
> > > the __napi_build_skb(), for both kmalloc and page backed skbs? That is,
> > > always doing the 'data' allocation in __napi_alloc_skb() - either via
> > > page_frag or via kmalloc() - and than call __napi_build_skb().
> > > 
> > > I think that should avoid adding more checks in __alloc_skb() and
> > > should probably reduce the number of conditional used
> > > by __napi_alloc_skb().
> > 
> > I thought of this too. But this will introduce conditional branch
> > to set or not skb->head_frag. So one branch less in __alloc_skb(),
> > one branch more here, and we also lose the ability to __alloc_skb()
> > with decached head.
> 
> Just to try to be clear, I mean something alike the following (not even
> build tested). In the fast path it has less branches than the current
> code - for both kmalloc and page_frag allocation.
> 
> ---
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 785daff48030..a242fbe4730e 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -506,23 +506,12 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
>  				 gfp_t gfp_mask)
>  {
>  	struct napi_alloc_cache *nc;
> +	bool head_frag, pfmemalloc;
>  	struct sk_buff *skb;
>  	void *data;
>  
>  	len += NET_SKB_PAD + NET_IP_ALIGN;
>  
> -	/* If requested length is either too small or too big,
> -	 * we use kmalloc() for skb->head allocation.
> -	 */
> -	if (len <= SKB_WITH_OVERHEAD(1024) ||
> -	    len > SKB_WITH_OVERHEAD(PAGE_SIZE) ||
> -	    (gfp_mask & (__GFP_DIRECT_RECLAIM | GFP_DMA))) {
> -		skb = __alloc_skb(len, gfp_mask, SKB_ALLOC_RX, NUMA_NO_NODE);
> -		if (!skb)
> -			goto skb_fail;
> -		goto skb_success;
> -	}
> -
>  	nc = this_cpu_ptr(&napi_alloc_cache);
>  	len += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>  	len = SKB_DATA_ALIGN(len);
> @@ -530,25 +519,34 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
>  	if (sk_memalloc_socks())
>  		gfp_mask |= __GFP_MEMALLOC;
>  
> -	data = page_frag_alloc(&nc->page, len, gfp_mask);
> +	if (len <= SKB_WITH_OVERHEAD(1024) ||
> +            len > SKB_WITH_OVERHEAD(PAGE_SIZE) ||
> +            (gfp_mask & (__GFP_DIRECT_RECLAIM | GFP_DMA))) {
> +		data = kmalloc_reserve(len, gfp_mask, NUMA_NO_NODE, &pfmemalloc);
> +		head_frag = 0;
> +		len = 0;
> +	} else {
> +		data = page_frag_alloc(&nc->page, len, gfp_mask);
> +		pfmemalloc = nc->page.pfmemalloc;
> +		head_frag = 1;
> +	}
>  	if (unlikely(!data))
>  		return NULL;

Sure. I have a separate WIP series that reworks all three *alloc_skb()
functions, as there's a nice room for optimization, especially after
that tiny skbs now fall back to __alloc_skb().
It will likely hit mailing lists after the merge window and next
net-next season, not now. And it's not really connected with NAPI
cache reusing.

>  	skb = __build_skb(data, len);
>  	if (unlikely(!skb)) {
> -		skb_free_frag(data);
> +		if (head_frag)
> +			skb_free_frag(data);
> +		else
> +			kfree(data);
>  		return NULL;
>  	}
>  
> -	if (nc->page.pfmemalloc)
> -		skb->pfmemalloc = 1;
> -	skb->head_frag = 1;
> +	skb->pfmemalloc = pfmemalloc;
> +	skb->head_frag = head_frag;
>  
> -skb_success:
>  	skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
>  	skb->dev = napi->dev;
> -
> -skb_fail:
>  	return skb;
>  }
>  EXPORT_SYMBOL(__napi_alloc_skb);

Al