lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 11 Feb 2021 11:16:40 +0100
From:   Paolo Abeni <pabeni@...hat.com>
To:     Alexander Lobakin <alobakin@...me>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>
Cc:     Jonathan Lemon <jonathan.lemon@...il.com>,
        Eric Dumazet <edumazet@...gle.com>,
        Dmitry Vyukov <dvyukov@...gle.com>,
        Willem de Bruijn <willemb@...gle.com>,
        Randy Dunlap <rdunlap@...radead.org>,
        Kevin Hao <haokexin@...il.com>,
        Pablo Neira Ayuso <pablo@...filter.org>,
        Jakub Sitnicki <jakub@...udflare.com>,
        Marco Elver <elver@...gle.com>,
        Dexuan Cui <decui@...rosoft.com>,
        Jesper Dangaard Brouer <brouer@...hat.com>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Andrii Nakryiko <andriin@...com>,
        Taehee Yoo <ap420073@...il.com>,
        Cong Wang <xiyou.wangcong@...il.com>,
        Björn Töpel <bjorn@...nel.org>,
        Miaohe Lin <linmiaohe@...wei.com>,
        Guillaume Nault <gnault@...hat.com>,
        Yonghong Song <yhs@...com>, zhudi <zhudi21@...wei.com>,
        Michal Kubecek <mkubecek@...e.cz>,
        Marcelo Ricardo Leitner <marcelo.leitner@...il.com>,
        Dmitry Safonov <0x7f454c46@...il.com>,
        Yang Yingliang <yangyingliang@...wei.com>,
        Florian Westphal <fw@...len.de>,
        Edward Cree <ecree.xilinx@...il.com>,
        linux-kernel@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [PATCH v4 net-next 09/11] skbuff: allow to optionally use NAPI
 cache from __alloc_skb()

On Wed, 2021-02-10 at 16:30 +0000, Alexander Lobakin wrote:
> Reuse the old and forgotten SKB_ALLOC_NAPI to add an option to get
> an skbuff_head from the NAPI cache instead of inplace allocation
> inside __alloc_skb().
> This implies that the function is called from softirq or BH-off
> context, not for allocating a clone or from a distant node.
> 
> Signed-off-by: Alexander Lobakin <alobakin@...me>
> ---
>  net/core/skbuff.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 9e1a8ded4acc..750fa1825b28 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -397,15 +397,20 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
>  	struct sk_buff *skb;
>  	u8 *data;
>  	bool pfmemalloc;
> +	bool clone;
>  
> -	cache = (flags & SKB_ALLOC_FCLONE)
> -		? skbuff_fclone_cache : skbuff_head_cache;
> +	clone = !!(flags & SKB_ALLOC_FCLONE);
> +	cache = clone ? skbuff_fclone_cache : skbuff_head_cache;
>  
>  	if (sk_memalloc_socks() && (flags & SKB_ALLOC_RX))
>  		gfp_mask |= __GFP_MEMALLOC;
>  
>  	/* Get the HEAD */
> -	skb = kmem_cache_alloc_node(cache, gfp_mask & ~__GFP_DMA, node);
> +	if (!clone && (flags & SKB_ALLOC_NAPI) &&
> +	    likely(node == NUMA_NO_NODE || node == numa_mem_id()))
> +		skb = napi_skb_cache_get();
> +	else
> +		skb = kmem_cache_alloc_node(cache, gfp_mask & ~GFP_DMA, node);
>  	if (unlikely(!skb))
>  		return NULL;
>  	prefetchw(skb);

I hope the opt-in thing would have allowed leaving this code unchanged.
I see it's not trivial avoid touching this code path.
Still I think it would be nice if you would be able to let the device
driver use the cache without touching the above, which is also used
e.g. by the TCP xmit path, which in turn will not leverage the cache
(as it requires FCLONE skbs).

If I read correctly, the above chunk is needed to
allow __napi_alloc_skb() access the cache even for small skb
allocation. Good device drivers should not call alloc_skb() in the fast
path.

What about changing __napi_alloc_skb() to always use
the __napi_build_skb(), for both kmalloc and page backed skbs? That is,
always doing the 'data' allocation in __napi_alloc_skb() - either via
page_frag or via kmalloc() - and than call __napi_build_skb().

I think that should avoid adding more checks in __alloc_skb() and
should probably reduce the number of conditional used
by __napi_alloc_skb().

Thanks!

Paolo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ