[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <efa64c44b1e53510f972954e4670b2d8ecde97eb.camel@redhat.com>
Date: Fri, 23 Sep 2022 09:41:29 +0200
From: Paolo Abeni <pabeni@...hat.com>
To: Alexander Duyck <alexander.duyck@...il.com>
Cc: netdev@...r.kernel.org, Eric Dumazet <edumazet@...gle.com>,
"David S. Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>
Subject: Re: [PATCH net-next v2] net: skb: introduce and use a single page
frag cache
Hello,
On Thu, 2022-09-22 at 14:17 -0700, Alexander Duyck wrote:
[...]
> My suggestion earlier was to just make the 1k cache a page_frag_cache.
> It will allow you to reuse the same structure members and a single
> pointer to track them. Waste would be minimal since the only real
> difference between the structures is about 8B for the structure, and
> odds are the napi_alloc_cache allocation is being rounded up anyway.
>
> > unsigned int skb_count;
> > void *skb_cache[NAPI_SKB_CACHE_SIZE];
> > };
> > @@ -143,6 +202,23 @@ struct napi_alloc_cache {
> > static DEFINE_PER_CPU(struct page_frag_cache, netdev_alloc_cache);
> > static DEFINE_PER_CPU(struct napi_alloc_cache, napi_alloc_cache);
> >
> > +/* Double check that napi_get_frags() allocates skbs with
> > + * skb->head being backed by slab, not a page fragment.
> > + * This is to make sure bug fixed in 3226b158e67c
> > + * ("net: avoid 32 x truesize under-estimation for tiny skbs")
> > + * does not accidentally come back.
> > + */
> > +void napi_get_frags_check(struct napi_struct *napi)
> > +{
> > + struct sk_buff *skb;
> > +
> > + local_bh_disable();
> > + skb = napi_get_frags(napi);
> > + WARN_ON_ONCE(!NAPI_HAS_SMALL_PAGE_FRAG && skb && skb->head_frag);
> > + napi_free_frags(napi);
> > + local_bh_enable();
> > +}
> > +
> > void *__napi_alloc_frag_align(unsigned int fragsz, unsigned int align_mask)
> > {
> > struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache);
> > @@ -561,6 +637,7 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
> > {
> > struct napi_alloc_cache *nc;
> > struct sk_buff *skb;
> > + bool pfmemalloc;
>
> Rather than adding this I think you would be better off adding a
> struct page_frag_cache pointer. I will reference it here as "pfc".
>
> > void *data;
> >
> > DEBUG_NET_WARN_ON_ONCE(!in_softirq());
> > @@ -568,8 +645,10 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
> >
> > /* If requested length is either too small or too big,
> > * we use kmalloc() for skb->head allocation.
> > + * When the small frag allocator is available, prefer it over kmalloc
> > + * for small fragments
> > */
> > - if (len <= SKB_WITH_OVERHEAD(1024) ||
> > + if ((!NAPI_HAS_SMALL_PAGE_FRAG && len <= SKB_WITH_OVERHEAD(1024)) ||
> > len > SKB_WITH_OVERHEAD(PAGE_SIZE) ||
> > (gfp_mask & (__GFP_DIRECT_RECLAIM | GFP_DMA))) {
> > skb = __alloc_skb(len, gfp_mask, SKB_ALLOC_RX | SKB_ALLOC_NAPI,
> > @@ -580,13 +659,30 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
> > }
> >
> > nc = this_cpu_ptr(&napi_alloc_cache);
> > - len += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > - len = SKB_DATA_ALIGN(len);
> >
> > if (sk_memalloc_socks())
> > gfp_mask |= __GFP_MEMALLOC;
> >
> > - data = page_frag_alloc(&nc->page, len, gfp_mask);
> > + if (NAPI_HAS_SMALL_PAGE_FRAG && len <= SKB_WITH_OVERHEAD(1024)) {
>
> Then here you would add a line that would be:
> pfc = &nc->page_small;
>
> > + /* we are artificially inflating the allocation size, but
> > + * that is not as bad as it may look like, as:
> > + * - 'len' less then GRO_MAX_HEAD makes little sense
> > + * - larger 'len' values lead to fragment size above 512 bytes
> > + * as per NAPI_HAS_SMALL_PAGE_FRAG definition
> > + * - kmalloc would use the kmalloc-1k slab for such values
> > + */
> > + len = SZ_1K;
> > +
> > + data = page_frag_alloc_1k(&nc->page_small, gfp_mask);
> > + pfmemalloc = nc->page_small.pfmemalloc;
>
> Instead of setting pfmemalloc you could just update the line below. In
> addition you would just be passing pfc as the parameter.
>
> > + } else {
>
> Likewise here you would have the line:
> pfc = &nc->page;
Probaly I was not clear in my previois email, sorry: before posting
this version I tried locally exactly all the above, and the generated
code with gcc 11.3.1 is a little bigger (a couple of instructions) than
what this version produces (with gcc 11.3.1-2). It has the same number
of conditionals and a slightly larger napi_alloc_cache.
Additionally the suggested alternative needs more pre-processor
conditionals to handle the !NAPI_HAS_SMALL_PAGE_FRAG case - avoiding
adding a 2nd, unused in that case, page_frag_cache.
[...]
>
> > @@ -596,7 +692,7 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
> > return NULL;
> > }
> >
> > - if (nc->page.pfmemalloc)
> > + if (pfmemalloc)
>
> Instead of passing pfmemalloc you could just check pfc->pfmemalloc.
> Alternatively I wonder if it wouldn't be faster to just set the value
> directly based on frag_cache->pfmemalloc and avoid the conditional
> heck entirely.
Note that:
skb->pfmemalloc = pfmemalloc;
avoids a branch but requires more instructions than the current code
(verified with gcc). The gain looks doubtful?!? Additionally we have
statement alike:
if (<pfmemalloc>)
skb->pfmemalloc = 1;
in other functions in skbuff.c - still fast-path - and would be better
updating all the places together for consistency - if that is really
considered an improvement. IMHO it should at least land in a different
patch.
I'll post a v3 with your updated email address, but I think the current
code is the better option.
Cheers,
Paolo
Powered by blists - more mailing lists