[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <82fa1976-d60b-e1fc-f87d-0f35d2f1eeab@bytedance.com>
Date: Mon, 5 Jun 2023 23:06:57 +0800
From: Peng Zhang <zhangpeng.00@...edance.com>
To: "Liam R. Howlett" <Liam.Howlett@...cle.com>
Cc: Peng Zhang <zhangpeng.00@...edance.com>,
"Yin, Fengwei" <fengwei.yin@...el.com>,
maple-tree@...ts.infradead.org, linux-mm@...ck.org,
Andrew Morton <akpm@...ux-foundation.org>,
"Liu, Yujie" <yujie.liu@...el.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 00/14] Reduce preallocations for maple tree
在 2023/6/5 22:44, Liam R. Howlett 写道:
> * Peng Zhang <zhangpeng.00@...edance.com> [230605 10:27]:
>>
>>
>> 在 2023/6/5 22:03, Liam R. Howlett 写道:
>>> * Peng Zhang <zhangpeng.00@...edance.com> [230605 03:59]:
>>>>
>>>>
>>>> 在 2023/6/5 14:18, Yin, Fengwei 写道:
>>>>>
>>>>>
>>>>> On 6/5/2023 12:41 PM, Yin Fengwei wrote:
>>>>>> Hi Peng,
>>>>>>
>>>>>> On 6/5/23 11:28, Peng Zhang wrote:
>>>>>>>
>>>>>>>
>>>>>>> 在 2023/6/2 16:10, Yin, Fengwei 写道:
>>>>>>>> Hi Liam,
>>>>>>>>
>>>>>>>> On 6/1/2023 10:15 AM, Liam R. Howlett wrote:
>>>>>>>>> Initial work on preallocations showed no regression in performance
>>>>>>>>> during testing, but recently some users (both on [1] and off [android]
>>>>>>>>> list) have reported that preallocating the worst-case number of nodes
>>>>>>>>> has caused some slow down. This patch set addresses the number of
>>>>>>>>> allocations in a few ways.
>>>>>>>>>
>>>>>>>>> During munmap() most munmap() operations will remove a single VMA, so
>>>>>>>>> leverage the fact that the maple tree can place a single pointer at
>>>>>>>>> range 0 - 0 without allocating. This is done by changing the index in
>>>>>>>>> the 'sidetree'.
>>>>>>>>>
>>>>>>>>> Re-introduce the entry argument to mas_preallocate() so that a more
>>>>>>>>> intelligent guess of the node count can be made.
>>>>>>>>>
>>>>>>>>> Patches are in the following order:
>>>>>>>>> 0001-0002: Testing framework for benchmarking some operations
>>>>>>>>> 0003-0004: Reduction of maple node allocation in sidetree
>>>>>>>>> 0005: Small cleanup of do_vmi_align_munmap()
>>>>>>>>> 0006-0013: mas_preallocate() calculation change
>>>>>>>>> 0014: Change the vma iterator order
>>>>>>>> I did run The AIM:page_test on an IceLake 48C/96T + 192G RAM platform with
>>>>>>>> this patchset.
>>>>>>>>
>>>>>>>> The result has a little bit improvement:
>>>>>>>> Base (next-20230602):
>>>>>>>> 503880
>>>>>>>> Base with this patchset:
>>>>>>>> 519501
>>>>>>>>
>>>>>>>> But they are far from the none-regression result (commit 7be1c1a3c7b1):
>>>>>>>> 718080
>>>>>>>>
>>>>>>>>
>>>>>>>> Some other information I collected:
>>>>>>>> With Base, the mas_alloc_nodes are always hit with request: 7.
>>>>>>>> With this patchset, the request are 1 or 5.
>>>>>>>>
>>>>>>>> I suppose this is the reason for improvement from 503880 to 519501.
>>>>>>>>
>>>>>>>> With commit 7be1c1a3c7b1, mas_store_gfp() in do_brk_flags never triggered
>>>>>>>> mas_alloc_nodes() call. Thanks.
>>>>>>> Hi Fengwei,
>>>>>>>
>>>>>>> I think it may be related to the inaccurate number of nodes allocated
>>>>>>> in the pre-allocation. I slightly modified the pre-allocation in this
>>>>>>> patchset, but I don't know if it works. It would be great if you could
>>>>>>> help test it, and help pinpoint the cause. Below is the diff, which can
>>>>>>> be applied based on this pachset.
>>>>>> I tried the patch, it could eliminate the call of mas_alloc_nodes() during
>>>>>> the test. But the result of benchmark got a little bit improvement:
>>>>>> 529040
>>>>>>
>>>>>> But it's still much less than none-regression result. I will also double
>>>>>> confirm the none-regression result.
>>>>> Just noticed that the commit f5715584af95 make validate_mm() two implementation
>>>>> based on CONFIG_DEBUG_VM instead of CONFIG_DEBUG_VM_MAPPLE_TREE). I have
>>>>> CONFIG_DEBUG_VM but not CONFIG_DEBUG_VM_MAPPLE_TREE defined. So it's not an
>>>>> apple to apple.
>>>
>>> You mean "mm: update validate_mm() to use vma iterator" here I guess. I
>>> have it as a different commit id in my branch.
>>>
>>> I 'restored' some of the checking because I was able to work around not
>>> having the mt_dump() definition with the vma iterator. I'm now
>>> wondering how wide spread CONFIG_DEBUG_VM is used and if I should not
>>> have added these extra checks.
>>>
>>>>>
>>>>>
>>>>> I disable CONFIG_DEBUG_VM and re-run the test and got:
>>>>> Before preallocation change (7be1c1a3c7b1):
>>>>> 770100
>>>>> After preallocation change (28c5609fb236):
>>>>> 680000
>>>>> With liam's fix:
>>>>> 702100
>>>>> plus Peng's fix:
>>>>> 725900
>>>> Thank you for your test, now it seems that the performance
>>>> regression is not so much.
>>>
>>> We are also too strict on the reset during mas_store_prealloc() checking
>>> for a spanning write. I have a fix for this for v2 of the patch set,
>>> although I suspect it will not make a huge difference.
>>>
>>>>>
>>>>>
>>>>> Regards
>>>>> Yin, Fengwei
>>>>>
>>>>>>
>>>>>>
>>>>>> Regards
>>>>>> Yin, Fengwei
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Peng
>>>>>>>
>>>>>>> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
>>>>>>> index 5ea211c3f186..e67bf2744384 100644
>>>>>>> --- a/lib/maple_tree.c
>>>>>>> +++ b/lib/maple_tree.c
>>>>>>> @@ -5575,9 +5575,11 @@ int mas_preallocate(struct ma_state *mas, void *entry, gfp_t gfp)
>>>>>>> goto ask_now;
>>>>>>> }
>>>>>>>
>>>>>>> - /* New root needs a singe node */
>>>>>>> - if (unlikely(mte_is_root(mas->node)))
>>>>>>> - goto ask_now;
>>>
>>> Why did you drop this? If we are creating a new root we will only need
>>> one node.
>> The code below handles the root case perfectly,
>> we don't need additional checks.
>> if (node_size - 1 <= mt_min_slots[wr_mas.type])
>> request = mas_mt_height(mas) * 2 - 1;
>
> Unless we have the minimum for a node, in which case we will fall
> through and ask for one node. This works and is rare enough so I'll
> drop it. Thanks.
>
>>>
>>>>>>> + if ((node_size == wr_mas.node_end + 1 &&
>>>>>>> + mas->offset == wr_mas.node_end) ||
>>>>>>> + (node_size == wr_mas.node_end &&
>>>>>>> + wr_mas.offset_end - mas->offset == 1))
>>>>>>> + return 0;
>>>
>>> I will add this to v2 as well, or something similar.
>>>
>>>>>>>
>>>>>>> /* Potential spanning rebalance collapsing a node, use worst-case */
>>>>>>> if (node_size - 1 <= mt_min_slots[wr_mas.type])
>>>>>>> @@ -5590,7 +5592,6 @@ int mas_preallocate(struct ma_state *mas, void *entry, gfp_t gfp)
>>>>>>> if (likely(!mas_is_err(mas)))
>>>>>>> return 0;
>>>>>>>
>>>>>>> - mas_set_alloc_req(mas, 0);
>>>
>>> Why did you drop this? It seems like a worth while cleanup on failure.
>> Because we will clear it in mas_node_count_gfp()->mas_alloc_nodes().
>
> On failure we set the alloc request to the remainder of what was not
> allocated.
Yes, you are right.
>
>>>
>>>>>>> ret = xa_err(mas->node);
>>>>>>> mas_reset(mas);
>>>>>>> mas_destroy(mas);
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Regards
>>>>>>>> Yin, Fengwei
>>>>>>>>
>>>>>>>>>
>>>>>>>>> [1] https://lore.kernel.org/linux-mm/202305061457.ac15990c-yujie.liu@intel.com/
>>>>>>>>>
>>>>>>>>> Liam R. Howlett (14):
>>>>>>>>> maple_tree: Add benchmarking for mas_for_each
>>>>>>>>> maple_tree: Add benchmarking for mas_prev()
>>>>>>>>> mm: Move unmap_vmas() declaration to internal header
>>>>>>>>> mm: Change do_vmi_align_munmap() side tree index
>>>>>>>>> mm: Remove prev check from do_vmi_align_munmap()
>>>>>>>>> maple_tree: Introduce __mas_set_range()
>>>>>>>>> mm: Remove re-walk from mmap_region()
>>>>>>>>> maple_tree: Re-introduce entry to mas_preallocate() arguments
>>>>>>>>> mm: Use vma_iter_clear_gfp() in nommu
>>>>>>>>> mm: Set up vma iterator for vma_iter_prealloc() calls
>>>>>>>>> maple_tree: Move mas_wr_end_piv() below mas_wr_extend_null()
>>>>>>>>> maple_tree: Update mas_preallocate() testing
>>>>>>>>> maple_tree: Refine mas_preallocate() node calculations
>>>>>>>>> mm/mmap: Change vma iteration order in do_vmi_align_munmap()
>>>>>>>>>
>>>>>>>>> fs/exec.c | 1 +
>>>>>>>>> include/linux/maple_tree.h | 23 ++++-
>>>>>>>>> include/linux/mm.h | 4 -
>>>>>>>>> lib/maple_tree.c | 78 ++++++++++----
>>>>>>>>> lib/test_maple_tree.c | 74 +++++++++++++
>>>>>>>>> mm/internal.h | 40 ++++++--
>>>>>>>>> mm/memory.c | 16 ++-
>>>>>>>>> mm/mmap.c | 171 ++++++++++++++++---------------
>>>>>>>>> mm/nommu.c | 45 ++++----
>>>>>>>>> tools/testing/radix-tree/maple.c | 59 ++++++-----
>>>>>>>>> 10 files changed, 331 insertions(+), 180 deletions(-)
>>>>>>>>>
Powered by blists - more mailing lists