[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZilwPJ3QZC-7ideG@hog>
Date: Wed, 24 Apr 2024 22:49:00 +0200
From: Sabrina Dubroca <sd@...asysnail.net>
To: Paolo Abeni <pabeni@...hat.com>
Cc: David J Wilder <dwilder@...ibm.com>, netdev@...r.kernel.org,
edumazet@...gle.com
Subject: Re: [RFC PATCH] net: skb: Increasing allocation in
__napi_alloc_skb() to 2k when needed.
2024-04-23, 09:56:33 +0200, Paolo Abeni wrote:
> On Fri, 2024-04-19 at 15:23 -0700, David J Wilder wrote:
> > When testing CONFIG_MAX_SKB_FRAGS=45 on ppc64le and x86_64 I ran into a
> > couple of issues.
> >
> > __napi_alloc_skb() assumes its smallest fragment allocations will fit in
> > 1K. When CONFIG_MAX_SKB_FRAGS is increased this may no longer be true
> > resulting in __napi_alloc_skb() reverting to using page_frag_alloc().
> > This results in the return of the bug fixed in:
> > Commit 3226b158e67c ("net: avoid 32 x truesize under-estimation for
> > tiny skbs")
> >
> > That commit insured that "small skb head fragments are kmalloc backed,
> > so that other objects in the slab page can be reused instead of being held
> > as long as skbs are sitting in socket queues."
> >
> > On ppc64le the warning from napi_get_frags_check() is displayed when
> > CONFIG_MAX_SKB_FRAGS is set to 45. The purpose of the warning is to detect
> > when an increase of MAX_SKB_FRAGS has reintroduced the aforementioned bug.
> > Unfortunately on x86_64 this warning is not seen, even though it should be.
> > I found the warning was disabled by:
> > commit dbae2b062824 ("net: skb: introduce and use a single page frag
> > cache")
> >
> > This RFC patch to __napi_alloc_skb() determines if an skbuff allocation
> > with a head fragment of size GRO_MAX_HEAD will fit in a 1k allocation,
> > increasing the allocation to 2k if needed.
> >
> > I have functionally tested this patch, performance testing is still needed.
> >
> > TBD: Remove the limitation on 4k page size from the single page frag cache
> > allowing ppc64le (64K page size) to benefit from this change.
> >
> > TBD: I have not address the warning in napi_get_frags_check() on x86_64.
> > Will the warning still be needed once the other changes are completed?
>
>
> Thanks for the detailed analysis.
>
> As mentioned by Eric in commit
> bf9f1baa279f0758dc2297080360c5a616843927, it should be now possible to
> revert dbae2b062824 without incurring in performance regressions for
> the relevant use-case. I had that on my todo list since a lot of time,
> but I was unable to allocate time for that.
>
> I think such revert would be preferable. Would you be able to evaluate
> such option?
I don't think reverting dbae2b062824 would fix David's issue.
The problem is that with MAX_SKB_FRAGS=45, skb_shared_info becomes
huge, so 1024 is not enough for those small packets, and we use a
pagefrag instead of kmalloc, which makes napi_get_frags_check unhappy.
Even after reverting dbae2b062824, we would still go through the
pagefrag path and not __alloc_skb.
What about something like this? (boot-tested on x86 only, but I
disabled NAPI_HAS_SMALL_PAGE_FRAG. no perf testing at all.)
-------- 8< --------
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index f85e6989c36c..88923b7b64fe 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -108,6 +108,8 @@ static struct kmem_cache *skbuff_ext_cache __ro_after_init;
#define SKB_SMALL_HEAD_HEADROOM \
SKB_WITH_OVERHEAD(SKB_SMALL_HEAD_CACHE_SIZE)
+#define SKB_SMALL_HEAD_THRESHOLD (SKB_SMALL_HEAD_HEADROOM + NET_SKB_PAD + NET_IP_ALIGN)
+
int sysctl_max_skb_frags __read_mostly = MAX_SKB_FRAGS;
EXPORT_SYMBOL(sysctl_max_skb_frags);
@@ -726,7 +728,7 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int len,
/* If requested length is either too small or too big,
* we use kmalloc() for skb->head allocation.
*/
- if (len <= SKB_WITH_OVERHEAD(1024) ||
+ if (len <= SKB_SMALL_HEAD_THRESHOLD ||
len > SKB_WITH_OVERHEAD(PAGE_SIZE) ||
(gfp_mask & (__GFP_DIRECT_RECLAIM | GFP_DMA))) {
skb = __alloc_skb(len, gfp_mask, SKB_ALLOC_RX, NUMA_NO_NODE);
@@ -802,7 +804,7 @@ struct sk_buff *napi_alloc_skb(struct napi_struct *napi, unsigned int len)
* When the small frag allocator is available, prefer it over kmalloc
* for small fragments
*/
- if ((!NAPI_HAS_SMALL_PAGE_FRAG && len <= SKB_WITH_OVERHEAD(1024)) ||
+ if ((!NAPI_HAS_SMALL_PAGE_FRAG && len <= SKB_SMALL_HEAD_THRESHOLD) ||
len > SKB_WITH_OVERHEAD(PAGE_SIZE) ||
(gfp_mask & (__GFP_DIRECT_RECLAIM | GFP_DMA))) {
skb = __alloc_skb(len, gfp_mask, SKB_ALLOC_RX | SKB_ALLOC_NAPI,
-------- 8< --------
(__)napi_alloc_skb extends the GRO_MAX_HEAD size by NET_SKB_PAD +
NET_IP_ALIGN, so I added them here as well. Mainly this is reusing a
size that we know if big enough to fit a small header and whatever
size skb_shared_info is on the current build. Maybe this could be
max(SKB_WITH_OVERHEAD(1024), <...>) to preserve the current behavior
on MAX_SKB_FRAGS=17, since in that case
SKB_WITH_OVERHEAD(1024) > SKB_SMALL_HEAD_HEADROOM
--
Sabrina
Powered by blists - more mailing lists