lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Fri, 26 Apr 2024 20:50:49 +0200
From: Eric Dumazet <edumazet@...gle.com>
To: David J Wilder <dwilder@...ibm.com>
Cc: dwilder@...ux.ibm.com, netdev@...r.kernel.org, pabeni@...hat.com, 
	sd@...asysnail.net, wilder@...ibm.com
Subject: Re: [RFC PATCH] net: skb: Increasing allocation in __napi_alloc_skb()
 to 2k when needed.

On Fri, Apr 26, 2024 at 8:28 PM David J Wilder <dwilder@...ibm.com> wrote:
>
> On Wed, Apr 24, 2024 at 10:49 PM Sabrina Dubroca <sd@...asysnail.net> wrote:
> >>
> >> 2024-04-23, 09:56:33 +0200, Paolo Abeni wrote:
> >> > On Fri, 2024-04-19 at 15:23 -0700, David J Wilder wrote:
> >> > > When testing CONFIG_MAX_SKB_FRAGS=45 on ppc64le and x86_64 I ran into a
> >> > > couple of issues.
> >> > >
> >> > > __napi_alloc_skb() assumes its smallest fragment allocations will fit in
> >> > > 1K. When CONFIG_MAX_SKB_FRAGS is increased this may no longer be true
> >> > > resulting in __napi_alloc_skb() reverting to using page_frag_alloc().
> >> > > This results in the return of the bug fixed in:
> >> > > Commit 3226b158e67c ("net: avoid 32 x truesize under-estimation for
> >> > > tiny skbs")
> >> > >
> >> > > That commit insured that "small skb head fragments are kmalloc backed,
> >> > > so that other objects in the slab page can be reused instead of being held
> >> > > as long as skbs are sitting in socket queues."
> >> > >
> >> > > On ppc64le the warning from napi_get_frags_check() is displayed when
> >> > > CONFIG_MAX_SKB_FRAGS is set to 45. The purpose of the warning is to detect
> >> > > when an increase of MAX_SKB_FRAGS has reintroduced the aforementioned bug.
> >> > > Unfortunately on x86_64 this warning is not seen, even though it should be.
> >> > > I found the warning was disabled by:
> >> > > commit dbae2b062824 ("net: skb: introduce and use a single page frag
> >> > > cache")
> >> > >
> >> > > This RFC patch to __napi_alloc_skb() determines if an skbuff allocation
> >> > > with a head fragment of size GRO_MAX_HEAD will fit in a 1k allocation,
> >> > > increasing the allocation to 2k if needed.
> >> > >
> >> > > I have functionally tested this patch, performance testing is still needed.
> >> > >
> >> > > TBD: Remove the limitation on 4k page size from the single page frag cache
> >> > > allowing ppc64le (64K page size) to benefit from this change.
> >> > >
> >> > > TBD: I have not address the warning in napi_get_frags_check() on x86_64.
> >> > > Will the warning still be needed once the other changes are completed?
> >> >
> >> >
> >> > Thanks for the detailed analysis.
> >> >
> >> > As mentioned by Eric in commit
> >> > bf9f1baa279f0758dc2297080360c5a616843927, it should be now possible to
> >> > revert dbae2b062824 without incurring in performance regressions for
> >> > the relevant use-case. I had that on my todo list since a lot of time,
> >> > but I was unable to allocate time for that.
> >> >
> >> > I think such revert would be preferable. Would you be able to evaluate
> >> > such option?
>
> Thanks Paolo,  yes, I can evaluate removing dbae2b062824.
>
> >>
> >> I don't think reverting dbae2b062824 would fix David's issue.
> >>
> >> The problem is that with MAX_SKB_FRAGS=45, skb_shared_info becomes
> >> huge, so 1024 is not enough for those small packets, and we use a
> >> pagefrag instead of kmalloc, which makes napi_get_frags_check unhappy.
> >>
> >
> > 768 bytes is not huge ...
> >
> >> Even after reverting dbae2b062824, we would still go through the
> >> pagefrag path and not __alloc_skb.
> >>
> >> What about something like this?  (boot-tested on x86 only, but I
> >> disabled NAPI_HAS_SMALL_PAGE_FRAG. no perf testing at all.)
> >>
> >> -------- 8< --------
> >> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> >> index f85e6989c36c..88923b7b64fe 100644
> >> --- a/net/core/skbuff.c
> >> +++ b/net/core/skbuff.c
> >> @@ -108,6 +108,8 @@ static struct kmem_cache *skbuff_ext_cache __ro_after_init;
> >>  #define SKB_SMALL_HEAD_HEADROOM                                                \
> >>         SKB_WITH_OVERHEAD(SKB_SMALL_HEAD_CACHE_SIZE)
> >>
> >> +#define SKB_SMALL_HEAD_THRESHOLD (SKB_SMALL_HEAD_HEADROOM + NET_SKB_PAD + NET_IP_ALIGN)
> >> +
> >>  int sysctl_max_skb_frags __read_mostly = MAX_SKB_FRAGS;
> >>  EXPORT_SYMBOL(sysctl_max_skb_frags);
> >>
> >> @@ -726,7 +728,7 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int len,
> >>         /* If requested length is either too small or too big,
> >>          * we use kmalloc() for skb->head allocation.
> >>          */
> >> -       if (len <= SKB_WITH_OVERHEAD(1024) ||
> >> +       if (len <= SKB_SMALL_HEAD_THRESHOLD ||
> >>             len > SKB_WITH_OVERHEAD(PAGE_SIZE) ||
> >>             (gfp_mask & (__GFP_DIRECT_RECLAIM | GFP_DMA))) {
> >>                 skb = __alloc_skb(len, gfp_mask, SKB_ALLOC_RX, NUMA_NO_NODE);
> >> @@ -802,7 +804,7 @@ struct sk_buff *napi_alloc_skb(struct napi_struct *napi, unsigned int len)
> >>          * When the small frag allocator is available, prefer it over kmalloc
> >>          * for small fragments
> >>          */
> >> -       if ((!NAPI_HAS_SMALL_PAGE_FRAG && len <= SKB_WITH_OVERHEAD(1024)) ||
> >> +       if ((!NAPI_HAS_SMALL_PAGE_FRAG && len <= SKB_SMALL_HEAD_THRESHOLD) ||
> >>             len > SKB_WITH_OVERHEAD(PAGE_SIZE) ||
> >>             (gfp_mask & (__GFP_DIRECT_RECLAIM | GFP_DMA))) {
> >>                 skb = __alloc_skb(len, gfp_mask, SKB_ALLOC_RX | SKB_ALLOC_NAPI,
> >> -------- 8< --------
> >>
> >> (__)napi_alloc_skb extends the GRO_MAX_HEAD size by NET_SKB_PAD +
> >> NET_IP_ALIGN, so I added them here as well. Mainly this is reusing a
> >> size that we know if big enough to fit a small header and whatever
> >> size skb_shared_info is on the current build. Maybe this could be
> >> max(SKB_WITH_OVERHEAD(1024), <...>) to preserve the current behavior
> >> on MAX_SKB_FRAGS=17, since in that case
> >> SKB_WITH_OVERHEAD(1024) > SKB_SMALL_HEAD_HEADROOM
> >>
> >>
>
> Thanks for the suggestion Sabrina, I will incorporate it.
>
> >
> > Here we simply use
> >
> > #define GRO_MAX_HEAD 192
>
> Sorry Eric, where did 192 come from?

99.9999 % of packets have a total header size smaller than 192 bytes.

Current linux definition is too big IMO

#define GRO_MAX_HEAD (MAX_HEADER + 128)

Depending on various CONFIG options, MAX_HEADER can be 176, and final
GRO_MAX_HEAD can be 304

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ