[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d9b4624a-214d-2f49-3883-4d62f60a9de4@linux.dev>
Date: Thu, 22 Feb 2024 16:56:35 +0800
From: Yajun Deng <yajun.deng@...ux.dev>
To: "Liam R. Howlett" <Liam.Howlett@...cle.com>, akpm@...ux-foundation.org,
vbabka@...e.cz, lstoakes@...il.com, surenb@...gle.com, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm/mmap: convert all mas except mas_detach to vma
iterator
On 2024/2/21 22:31, Liam R. Howlett wrote:
> * Yajun Deng <yajun.deng@...ux.dev> [240220 22:26]:
>> On 2024/2/21 02:06, Liam R. Howlett wrote:
>>> * Yajun Deng <yajun.deng@...ux.dev> [240218 21:30]:
>>>> Cc: Vlastimil, Lorenzo,Suren
>>>>
>>>> On 2024/2/18 10:31, Yajun Deng wrote:
>>>>> There are two types of iterators mas and vmi in the current code. If the
>>>>> maple tree comes from the mm struct, we can use vma iterator. Avoid using
>>>>> mas directly.
>>> Thanks for looking at this.
>>>
>>> I had left the maple state exposed in the mmap.c file because it does a
>>> number of operations that no one else does, so the new functions will be
>>> called a very limited number of times (as low as once).
>>>
>>> I think this is a worth while change since this may be needed in the
>>> future for dealing with more specialised uses of the tree. It also
>>> removes the 0/ULONG_MAX limits from certain calls, and the vma iterator
>>> names help explain things.
>>>
>>> I don't know why you treat the low/high search differently than the
>>> mas_reset() and mas_*_range(). In any case, the comment is inaccurate
>>> when mas_ functions are called with &vmi.mas in places.
>>>
>>>
>> Because the mas_reset() and mas_*_range() only covert mas to vmi. It's
>> simple.
>>
>> But the low/high also covert max to max - 1. It's a little more complex.
> Ah, so the code doesn't match the comment, since the code will still use
> mas directly in this version. This was, perhaps, the largest issue with
> the patch. Having a good patch log is very important as people rely on
> it during reviews, but more importantly when tracking down an issue
> later on.
>
> I like the idea of removing as many mas uses as feasible, but we will
> still have a few that must be passed through, so please change the
> wording.
>
Okay.
>>>>> Leave the mt_detach tree keep using mas, as it doesn't come from the mm
>>>>> struct.
>>> Well, it's still VMAs from the mm struct. I agree that we should not
>>> change this for now.
>>>
>>>>> Convert all mas except mas_detach to vma iterator. And introduce
>>>>> vma_iter_area_{lowest, highest} helper functions for use vma interator.
>>> Do you mean mas functions? You do pass the maple state through to other
>>> areas. ie: free_pgtables().
>>
>> Yes.
> ...
>>>>> retry:
>>>>> - if (mas_empty_area_rev(&mas, low_limit, high_limit - 1, length))
>>>>> + if (vma_iter_area_highest(&vmi, low_limit, high_limit, length))
>>>>> return -ENOMEM;
>>>>> - gap = mas.last + 1 - info->length;
>>>>> + gap = vma_iter_end(&vmi) - info->length;
>>>>> gap -= (gap - info->align_offset) & info->align_mask;
>>>>> - gap_end = mas.last;
>>>>> - tmp = mas_next(&mas, ULONG_MAX);
>>>>> + gap_end = vma_iter_end(&vmi);
>>> vma_iter_end will return vmi->mas.last + 1, is what you have here
>>> correct?
>>>
>> Yes, the following changes 'if (vm_start_gap(tmp) <= gap_end)' to 'if
>> (vm_start_gap(tmp) < gap_end)'.
>>
>>>>> + tmp = vma_next(&vmi);
>>>>> if (tmp && (tmp->vm_flags & VM_STARTGAP_FLAGS)) { /* Avoid prev check if possible */
>>>>> - if (vm_start_gap(tmp) <= gap_end) {
>>>>> + if (vm_start_gap(tmp) < gap_end) {
> Thanks. This works and the variable isn't used again.
>
> ...
>>>>> @@ -1959,11 +1958,12 @@ static int expand_upwards(struct vm_area_struct *vma, unsigned long address)
>>>>> struct vm_area_struct *next;
>>>>> unsigned long gap_addr;
>>>>> int error = 0;
>>>>> - MA_STATE(mas, &mm->mm_mt, vma->vm_start, address);
>>>>> + VMA_ITERATOR(vmi, mm, 0);
>>>>> if (!(vma->vm_flags & VM_GROWSUP))
>>>>> return -EFAULT;
>>>>> + vma_iter_config(&vmi, vma->vm_start, address);
>>> This is confusing. I think you are doing this so that the vma iterator
>>> is set up the same as the maple state, and not what is logically
>>> necessary?
>>
>> Yes, VMA_ITERATOR can only pass one address.
>>
>>>>> /* Guard against exceeding limits of the address space. */
>>>>> address &= PAGE_MASK;
>>>>> if (address >= (TASK_SIZE & PAGE_MASK))
>>>>> @@ -1985,15 +1985,15 @@ static int expand_upwards(struct vm_area_struct *vma, unsigned long address)
>>>>> }
>>>>> if (next)
>>>>> - mas_prev_range(&mas, address);
>>>>> + mas_prev_range(&vmi.mas, address);
>>> This isn't really hiding the maple state.
>>
>> Okay, I will create a new helper function for this in the mm/internal.h.
>>
>>>
>>>>> - __mas_set_range(&mas, vma->vm_start, address - 1);
>>>>> - if (mas_preallocate(&mas, vma, GFP_KERNEL))
>>>>> + vma_iter_config(&vmi, vma->vm_start, address);
>>> The above maple state changes is to get the maple state to point to the
>>> correct area for the preallocation call below. This seems unnecessary
>>> to me.
>>>
>>> We really should just set it up correctly. Unfortunately, with the VMA
>>> iterator, that's not really possible on initialization.
>>>
>>> What we can do is use the vma->vm_start for the initialization, then use
>>> vma_iter_config() here. That will not reset any state - but that's fine
>>> because the preallocation is the first call that actually uses it
>>> anyways.
>>>
>>> So we can initialize with vma->vm_start, don't call vma_iter_config
>>> until here, and also drop the if (next) part.
>>>
>>> This is possible here because it's not optimised like the
>>> expand_upwards() case, which uses the state to check prev and avoids an
>>> extra walk.
>>>
>>> Please make sure to test with the ltp tests on the stack combining, etc
>>> on a platform that expands down.
It seems something wrong about this description. This change is in
expand_upwards(), but not in
expand_downwards(). So we should test it on a platform that expands up.
And drop the if (next) part
is unnecessary. Did I get that right?
>>
>> Okay, I will test it.
> Testing this can be tricky. Thanks for looking at it.
>
> ...
>>>>> mmap_write_lock(mm);
>>>>> mt_clear_in_rcu(&mm->mm_mt);
>>>>> - mas_set(&mas, vma->vm_end);
>>>>> - free_pgtables(&tlb, &mas, vma, FIRST_USER_ADDRESS,
>>>>> + vma_iter_set(&vmi, vma->vm_end);
>>>>> + free_pgtables(&tlb, &vmi.mas, vma, FIRST_USER_ADDRESS,
>>>>> USER_PGTABLES_CEILING, true);
>>> I guess the page tables still deal with the maple state directly then.
>>
>> Yes.
> That's okay, we can leave that for another time. I believe Peng
> complicated the internals of free_pgtables() with his forking
> optimisation so care will need to be taken and should probably be done
> in another patch, another time. In fact, hiding xa_is_zero() within the
> vma iterator is going to be a really good thing to do, if performance
> doesn't suffer.
>
> Just don't state we are removing the use of maple state in the change
> log - since we do pass it through sometimes.
>
> ...
>
> Thanks again,
> Liam
Powered by blists - more mailing lists