lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <e14634b8-eb26-47da-b171-655abcd46cfe@huawei.com>
Date: Wed, 28 Aug 2024 20:12:55 +0800
From: Yunsheng Lin <linyunsheng@...wei.com>
To: Alexander Duyck <alexander.duyck@...il.com>
CC: <davem@...emloft.net>, <kuba@...nel.org>, <pabeni@...hat.com>,
	<netdev@...r.kernel.org>, <linux-kernel@...r.kernel.org>, Andrew Morton
	<akpm@...ux-foundation.org>, <linux-mm@...ck.org>
Subject: Re: [PATCH net-next v15 08/13] mm: page_frag: use __alloc_pages() to
 replace alloc_pages_node()

On 2024/8/27 23:30, Alexander Duyck wrote:
> On Tue, Aug 27, 2024 at 5:07 AM Yunsheng Lin <linyunsheng@...wei.com> wrote:
>>
>> On 2024/8/27 1:00, Alexander Duyck wrote:
>>> On Mon, Aug 26, 2024 at 5:46 AM Yunsheng Lin <linyunsheng@...wei.com> wrote:
>>>>
>>>> It seems there is about 24Bytes binary size increase for
>>>> __page_frag_cache_refill() after refactoring in arm64 system
>>>> with 64K PAGE_SIZE. By doing the gdb disassembling, It seems
>>>> we can have more than 100Bytes decrease for the binary size
>>>> by using __alloc_pages() to replace alloc_pages_node(), as
>>>> there seems to be some unnecessary checking for nid being
>>>> NUMA_NO_NODE, especially when page_frag is part of the mm
>>>> system.
>>>>
>>>> CC: Alexander Duyck <alexander.duyck@...il.com>
>>>> Signed-off-by: Yunsheng Lin <linyunsheng@...wei.com>
>>>> ---
>>>>  mm/page_frag_cache.c | 6 +++---
>>>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c
>>>> index bba59c87d478..e0ad3de11249 100644
>>>> --- a/mm/page_frag_cache.c
>>>> +++ b/mm/page_frag_cache.c
>>>> @@ -28,11 +28,11 @@ static struct page *__page_frag_cache_refill(struct page_frag_cache *nc,
>>>>  #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
>>>>         gfp_mask = (gfp_mask & ~__GFP_DIRECT_RECLAIM) |  __GFP_COMP |
>>>>                    __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC;
>>>> -       page = alloc_pages_node(NUMA_NO_NODE, gfp_mask,
>>>> -                               PAGE_FRAG_CACHE_MAX_ORDER);
>>>> +       page = __alloc_pages(gfp_mask, PAGE_FRAG_CACHE_MAX_ORDER,
>>>> +                            numa_mem_id(), NULL);
>>>>  #endif
>>>>         if (unlikely(!page)) {
>>>> -               page = alloc_pages_node(NUMA_NO_NODE, gfp, 0);
>>>> +               page = __alloc_pages(gfp, 0, numa_mem_id(), NULL);
>>>>                 if (unlikely(!page)) {
>>>>                         nc->encoded_page = 0;
>>>>                         return NULL;
>>>
>>> I still think this would be better served by fixing alloc_pages_node
>>> to drop the superfluous checks rather than changing the function. We
>>> would get more gain by just addressing the builtin constant and
>>> NUMA_NO_NODE case there.
>>
>> I am supposing by 'just addressing the builtin constant and NUMA_NO_NODE
>> case', it meant the below change from the previous discussion:
>>
>> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
>> index 01a49be7c98d..009ffb50d8cd 100644
>> --- a/include/linux/gfp.h
>> +++ b/include/linux/gfp.h
>> @@ -290,6 +290,9 @@ struct folio *__folio_alloc_node_noprof(gfp_t gfp, unsigned int order, int nid)
>>  static inline struct page *alloc_pages_node_noprof(int nid, gfp_t gfp_mask,
>>                                                    unsigned int order)
>>  {
>> +       if (__builtin_constant_p(nid) && nid == NUMA_NO_NODE)
>> +               return __alloc_pages_noprof(gfp_mask, order, numa_mem_id(), NULL);
>> +
>>         if (nid == NUMA_NO_NODE)
>>                 nid = numa_mem_id();
>>
>>
>> Actually it does not seem to get more gain by judging from binary size
>> changing as below, vmlinux.org is the image after this patchset, and
>> vmlinux is the image after this patchset with this patch reverted and
>> with above change applied.
>>
>> [linyunsheng@...alhost net-next]$ ./scripts/bloat-o-meter vmlinux.org vmlinux
>> add/remove: 0/2 grow/shrink: 16/12 up/down: 432/-340 (92)
>> Function                                     old     new   delta
>> new_slab                                     808    1124    +316
>> its_probe_one                               2860    2908     +48
> 
> ...
> 
>> alloc_slab_page                              284       -    -284
>> Total: Before=30534822, After=30534914, chg +0.00%
> 
> Well considering that alloc_slab_page was marked to be "inline" as per
> the qualifier applied to it I would say the shrinking had an effect,
> but it was just enough to enable the "inline" qualifier to kick in. It
> could be argued that the change exposed another issue in that the
> alloc_slab_page function is probably large enough that it should just
> be "static" and not "static inline". If you can provide you config I
> could probably look into this further but I suspect just dropping the
> inline for that one function should result in net savings.

The config is for the arm64 system, please see the attachment.

> 
> The only other big change I see is in its_probe_one which I am not
> sure why it would be impacted since it is not passing a constant in
> the first place, it is passing its->numa_node. I'd be curious what the
> disassembly shows in terms of the change that caused it to increase in
> size.
> 
> Otherwise the rest of the size changes seem more like code shifts than
> anything else likely due to the functions shifting around slightly due
> to a few dropping in size.
View attachment "config_alloc_page_opt" of type "text/plain" (285641 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ