lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 26 Apr 2024 11:25:59 -0700
From: David J Wilder <dwilder@...ibm.com>
To: edumazet@...gle.com
Cc: dwilder@...ux.ibm.com, netdev@...r.kernel.org, pabeni@...hat.com,
        sd@...asysnail.net, wilder@...ibm.com
Subject: Re: [RFC PATCH] net: skb: Increasing allocation in __napi_alloc_skb() to 2k when needed.

On Wed, Apr 24, 2024 at 10:49 PM Sabrina Dubroca <sd@...asysnail.net> wrote:
>>
>> 2024-04-23, 09:56:33 +0200, Paolo Abeni wrote:
>> > On Fri, 2024-04-19 at 15:23 -0700, David J Wilder wrote:
>> > > When testing CONFIG_MAX_SKB_FRAGS=45 on ppc64le and x86_64 I ran into a
>> > > couple of issues.
>> > >
>> > > __napi_alloc_skb() assumes its smallest fragment allocations will fit in
>> > > 1K. When CONFIG_MAX_SKB_FRAGS is increased this may no longer be true
>> > > resulting in __napi_alloc_skb() reverting to using page_frag_alloc().
>> > > This results in the return of the bug fixed in:
>> > > Commit 3226b158e67c ("net: avoid 32 x truesize under-estimation for
>> > > tiny skbs")
>> > >
>> > > That commit insured that "small skb head fragments are kmalloc backed,
>> > > so that other objects in the slab page can be reused instead of being held
>> > > as long as skbs are sitting in socket queues."
>> > >
>> > > On ppc64le the warning from napi_get_frags_check() is displayed when
>> > > CONFIG_MAX_SKB_FRAGS is set to 45. The purpose of the warning is to detect
>> > > when an increase of MAX_SKB_FRAGS has reintroduced the aforementioned bug.
>> > > Unfortunately on x86_64 this warning is not seen, even though it should be.
>> > > I found the warning was disabled by:
>> > > commit dbae2b062824 ("net: skb: introduce and use a single page frag
>> > > cache")
>> > >
>> > > This RFC patch to __napi_alloc_skb() determines if an skbuff allocation
>> > > with a head fragment of size GRO_MAX_HEAD will fit in a 1k allocation,
>> > > increasing the allocation to 2k if needed.
>> > >
>> > > I have functionally tested this patch, performance testing is still needed.
>> > >
>> > > TBD: Remove the limitation on 4k page size from the single page frag cache
>> > > allowing ppc64le (64K page size) to benefit from this change.
>> > >
>> > > TBD: I have not address the warning in napi_get_frags_check() on x86_64.
>> > > Will the warning still be needed once the other changes are completed?
>> >
>> >
>> > Thanks for the detailed analysis.
>> >
>> > As mentioned by Eric in commit
>> > bf9f1baa279f0758dc2297080360c5a616843927, it should be now possible to
>> > revert dbae2b062824 without incurring in performance regressions for
>> > the relevant use-case. I had that on my todo list since a lot of time,
>> > but I was unable to allocate time for that.
>> >
>> > I think such revert would be preferable. Would you be able to evaluate
>> > such option?

Thanks Paolo,  yes, I can evaluate removing dbae2b062824. 

>>
>> I don't think reverting dbae2b062824 would fix David's issue.
>>
>> The problem is that with MAX_SKB_FRAGS=45, skb_shared_info becomes
>> huge, so 1024 is not enough for those small packets, and we use a
>> pagefrag instead of kmalloc, which makes napi_get_frags_check unhappy.
>>
>
> 768 bytes is not huge ...
>
>> Even after reverting dbae2b062824, we would still go through the
>> pagefrag path and not __alloc_skb.
>>
>> What about something like this?  (boot-tested on x86 only, but I
>> disabled NAPI_HAS_SMALL_PAGE_FRAG. no perf testing at all.)
>>
>> -------- 8< --------
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index f85e6989c36c..88923b7b64fe 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -108,6 +108,8 @@ static struct kmem_cache *skbuff_ext_cache __ro_after_init;
>>  #define SKB_SMALL_HEAD_HEADROOM                                                \
>>         SKB_WITH_OVERHEAD(SKB_SMALL_HEAD_CACHE_SIZE)
>>
>> +#define SKB_SMALL_HEAD_THRESHOLD (SKB_SMALL_HEAD_HEADROOM + NET_SKB_PAD + NET_IP_ALIGN)
>> +
>>  int sysctl_max_skb_frags __read_mostly = MAX_SKB_FRAGS;
>>  EXPORT_SYMBOL(sysctl_max_skb_frags);
>>
>> @@ -726,7 +728,7 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int len,
>>         /* If requested length is either too small or too big,
>>          * we use kmalloc() for skb->head allocation.
>>          */
>> -       if (len <= SKB_WITH_OVERHEAD(1024) ||
>> +       if (len <= SKB_SMALL_HEAD_THRESHOLD ||
>>             len > SKB_WITH_OVERHEAD(PAGE_SIZE) ||
>>             (gfp_mask & (__GFP_DIRECT_RECLAIM | GFP_DMA))) {
>>                 skb = __alloc_skb(len, gfp_mask, SKB_ALLOC_RX, NUMA_NO_NODE);
>> @@ -802,7 +804,7 @@ struct sk_buff *napi_alloc_skb(struct napi_struct *napi, unsigned int len)
>>          * When the small frag allocator is available, prefer it over kmalloc
>>          * for small fragments
>>          */
>> -       if ((!NAPI_HAS_SMALL_PAGE_FRAG && len <= SKB_WITH_OVERHEAD(1024)) ||
>> +       if ((!NAPI_HAS_SMALL_PAGE_FRAG && len <= SKB_SMALL_HEAD_THRESHOLD) ||
>>             len > SKB_WITH_OVERHEAD(PAGE_SIZE) ||
>>             (gfp_mask & (__GFP_DIRECT_RECLAIM | GFP_DMA))) {
>>                 skb = __alloc_skb(len, gfp_mask, SKB_ALLOC_RX | SKB_ALLOC_NAPI,
>> -------- 8< --------
>>
>> (__)napi_alloc_skb extends the GRO_MAX_HEAD size by NET_SKB_PAD +
>> NET_IP_ALIGN, so I added them here as well. Mainly this is reusing a
>> size that we know if big enough to fit a small header and whatever
>> size skb_shared_info is on the current build. Maybe this could be
>> max(SKB_WITH_OVERHEAD(1024), <...>) to preserve the current behavior
>> on MAX_SKB_FRAGS=17, since in that case
>> SKB_WITH_OVERHEAD(1024) > SKB_SMALL_HEAD_HEADROOM
>>
>>

Thanks for the suggestion Sabrina, I will incorporate it. 

> 
> Here we simply use
> 
> #define GRO_MAX_HEAD 192

Sorry Eric, where did 192 come from?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ