lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <efa64c44b1e53510f972954e4670b2d8ecde97eb.camel@redhat.com>
Date:   Fri, 23 Sep 2022 09:41:29 +0200
From:   Paolo Abeni <pabeni@...hat.com>
To:     Alexander Duyck <alexander.duyck@...il.com>
Cc:     netdev@...r.kernel.org, Eric Dumazet <edumazet@...gle.com>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>
Subject: Re: [PATCH net-next v2] net: skb: introduce and use a single page
 frag cache

Hello,

On Thu, 2022-09-22 at 14:17 -0700, Alexander Duyck wrote:
[...]
> My suggestion earlier was to just make the 1k cache a page_frag_cache.
> It will allow you to reuse the same structure members and a single
> pointer to track them. Waste would be minimal since the only real
> difference between the structures is about 8B for the structure, and
> odds are the napi_alloc_cache allocation is being rounded up anyway.
> 
> >         unsigned int skb_count;
> >         void *skb_cache[NAPI_SKB_CACHE_SIZE];
> >  };
> > @@ -143,6 +202,23 @@ struct napi_alloc_cache {
> >  static DEFINE_PER_CPU(struct page_frag_cache, netdev_alloc_cache);
> >  static DEFINE_PER_CPU(struct napi_alloc_cache, napi_alloc_cache);
> > 
> > +/* Double check that napi_get_frags() allocates skbs with
> > + * skb->head being backed by slab, not a page fragment.
> > + * This is to make sure bug fixed in 3226b158e67c
> > + * ("net: avoid 32 x truesize under-estimation for tiny skbs")
> > + * does not accidentally come back.
> > + */
> > +void napi_get_frags_check(struct napi_struct *napi)
> > +{
> > +       struct sk_buff *skb;
> > +
> > +       local_bh_disable();
> > +       skb = napi_get_frags(napi);
> > +       WARN_ON_ONCE(!NAPI_HAS_SMALL_PAGE_FRAG && skb && skb->head_frag);
> > +       napi_free_frags(napi);
> > +       local_bh_enable();
> > +}
> > +
> >  void *__napi_alloc_frag_align(unsigned int fragsz, unsigned int align_mask)
> >  {
> >         struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache);
> > @@ -561,6 +637,7 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
> >  {
> >         struct napi_alloc_cache *nc;
> >         struct sk_buff *skb;
> > +       bool pfmemalloc;
> 
> Rather than adding this I think you would be better off adding a
> struct page_frag_cache pointer. I will reference it here as "pfc".
> 
> >         void *data;
> > 
> >         DEBUG_NET_WARN_ON_ONCE(!in_softirq());
> > @@ -568,8 +645,10 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
> > 
> >         /* If requested length is either too small or too big,
> >          * we use kmalloc() for skb->head allocation.
> > +        * When the small frag allocator is available, prefer it over kmalloc
> > +        * for small fragments
> >          */
> > -       if (len <= SKB_WITH_OVERHEAD(1024) ||
> > +       if ((!NAPI_HAS_SMALL_PAGE_FRAG && len <= SKB_WITH_OVERHEAD(1024)) ||
> >             len > SKB_WITH_OVERHEAD(PAGE_SIZE) ||
> >             (gfp_mask & (__GFP_DIRECT_RECLAIM | GFP_DMA))) {
> >                 skb = __alloc_skb(len, gfp_mask, SKB_ALLOC_RX | SKB_ALLOC_NAPI,
> > @@ -580,13 +659,30 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
> >         }
> > 
> >         nc = this_cpu_ptr(&napi_alloc_cache);
> > -       len += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > -       len = SKB_DATA_ALIGN(len);
> > 
> >         if (sk_memalloc_socks())
> >                 gfp_mask |= __GFP_MEMALLOC;
> > 
> > -       data = page_frag_alloc(&nc->page, len, gfp_mask);
> > +       if (NAPI_HAS_SMALL_PAGE_FRAG && len <= SKB_WITH_OVERHEAD(1024)) {
> 
> Then here you would add a line that would be:
> pfc = &nc->page_small;
> 
> > +               /* we are artificially inflating the allocation size, but
> > +                * that is not as bad as it may look like, as:
> > +                * - 'len' less then GRO_MAX_HEAD makes little sense
> > +                * - larger 'len' values lead to fragment size above 512 bytes
> > +                *   as per NAPI_HAS_SMALL_PAGE_FRAG definition
> > +                * - kmalloc would use the kmalloc-1k slab for such values
> > +                */
> > +               len = SZ_1K;
> > +
> > +               data = page_frag_alloc_1k(&nc->page_small, gfp_mask);
> > +               pfmemalloc = nc->page_small.pfmemalloc;
> 
> Instead of setting pfmemalloc you could just update the line below. In
> addition you would just be passing pfc as the parameter.
> 
> > +       } else {
> 
> Likewise here you would have the line:
> pfc = &nc->page;

Probaly I was not clear in my previois email, sorry: before posting
this version I tried locally exactly all the above, and the generated
code with gcc 11.3.1 is a little bigger (a couple of instructions) than
what this version produces (with gcc 11.3.1-2). It has the same number
of conditionals and a slightly larger napi_alloc_cache.

Additionally the suggested alternative needs more pre-processor
conditionals to handle the !NAPI_HAS_SMALL_PAGE_FRAG case - avoiding
adding a 2nd, unused in that case, page_frag_cache.

[...]
> 
> > @@ -596,7 +692,7 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
> >                 return NULL;
> >         }
> > 
> > -       if (nc->page.pfmemalloc)
> > +       if (pfmemalloc)
> 
> Instead of passing pfmemalloc you could just check pfc->pfmemalloc.
> Alternatively I wonder if it wouldn't be faster to just set the value
> directly based on frag_cache->pfmemalloc and avoid the conditional
> heck entirely.

Note that:

	skb->pfmemalloc = pfmemalloc;

avoids a branch but requires more instructions than the current code
(verified with gcc). The gain looks doubtful?!? Additionally we have
statement alike:

	if (<pfmemalloc>)
		skb->pfmemalloc = 1;

in other functions in skbuff.c - still fast-path - and would be better
updating all the places together for consistency - if that is really
considered an improvement. IMHO it should at least land in a different
patch.
	
I'll post a v3 with your updated email address, but I think the current
code is the better option.

Cheers,

Paolo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ