lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 22 Feb 2024 16:56:35 +0800
From: Yajun Deng <yajun.deng@...ux.dev>
To: "Liam R. Howlett" <Liam.Howlett@...cle.com>, akpm@...ux-foundation.org,
 vbabka@...e.cz, lstoakes@...il.com, surenb@...gle.com, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm/mmap: convert all mas except mas_detach to vma
 iterator


On 2024/2/21 22:31, Liam R. Howlett wrote:
> * Yajun Deng <yajun.deng@...ux.dev> [240220 22:26]:
>> On 2024/2/21 02:06, Liam R. Howlett wrote:
>>> * Yajun Deng <yajun.deng@...ux.dev> [240218 21:30]:
>>>> Cc:  Vlastimil, Lorenzo,Suren
>>>>
>>>> On 2024/2/18 10:31, Yajun Deng wrote:
>>>>> There are two types of iterators mas and vmi in the current code. If the
>>>>> maple tree comes from the mm struct, we can use vma iterator. Avoid using
>>>>> mas directly.
>>> Thanks for looking at this.
>>>
>>> I had left the maple state exposed in the mmap.c file because it does a
>>> number of operations that no one else does, so the new functions will be
>>> called a very limited number of times (as low as once).
>>>
>>> I think this is a worth while change since this may be needed in the
>>> future for dealing with more specialised uses of the tree.  It also
>>> removes the 0/ULONG_MAX limits from certain calls, and the vma iterator
>>> names help explain things.
>>>
>>> I don't know why you treat the low/high search differently than the
>>> mas_reset() and mas_*_range(). In any case, the comment is inaccurate
>>> when mas_ functions are called with &vmi.mas in places.
>>>
>>>
>> Because the mas_reset() and mas_*_range() only covert mas to vmi. It's
>> simple.
>>
>> But the low/high also covert max to max - 1. It's a little more complex.
> Ah, so the code doesn't match the comment, since the code will still use
> mas directly in this version.  This was, perhaps, the largest issue with
> the patch.  Having a good patch log is very important as people rely on
> it during reviews, but more importantly when tracking down an issue
> later on.
>
> I like the idea of removing as many mas uses as feasible, but we will
> still have a few that must be passed through, so please change the
> wording.
>

Okay.

>>>>> Leave the mt_detach tree keep using mas, as it doesn't come from the mm
>>>>> struct.
>>> Well, it's still VMAs from the mm struct.  I agree that we should not
>>> change this for now.
>>>
>>>>> Convert all mas except mas_detach to vma iterator. And introduce
>>>>> vma_iter_area_{lowest, highest} helper functions for use vma interator.
>>> Do you mean mas functions?  You do pass the maple state through to other
>>> areas.  ie: free_pgtables().
>>
>> Yes.
> ...
>>>>>     retry:
>>>>> -	if (mas_empty_area_rev(&mas, low_limit, high_limit - 1, length))
>>>>> +	if (vma_iter_area_highest(&vmi, low_limit, high_limit, length))
>>>>>     		return -ENOMEM;
>>>>> -	gap = mas.last + 1 - info->length;
>>>>> +	gap = vma_iter_end(&vmi) - info->length;
>>>>>     	gap -= (gap - info->align_offset) & info->align_mask;
>>>>> -	gap_end = mas.last;
>>>>> -	tmp = mas_next(&mas, ULONG_MAX);
>>>>> +	gap_end = vma_iter_end(&vmi);
>>> vma_iter_end will return vmi->mas.last + 1, is what you have here
>>> correct?
>>>
>> Yes, the following changes 'if (vm_start_gap(tmp) <= gap_end)' to 'if
>> (vm_start_gap(tmp) < gap_end)'.
>>
>>>>> +	tmp = vma_next(&vmi);
>>>>>     	if (tmp && (tmp->vm_flags & VM_STARTGAP_FLAGS)) { /* Avoid prev check if possible */
>>>>> -		if (vm_start_gap(tmp) <= gap_end) {
>>>>> +		if (vm_start_gap(tmp) < gap_end) {
> Thanks.  This works and the variable isn't used again.
>
> ...
>>>>> @@ -1959,11 +1958,12 @@ static int expand_upwards(struct vm_area_struct *vma, unsigned long address)
>>>>>     	struct vm_area_struct *next;
>>>>>     	unsigned long gap_addr;
>>>>>     	int error = 0;
>>>>> -	MA_STATE(mas, &mm->mm_mt, vma->vm_start, address);
>>>>> +	VMA_ITERATOR(vmi, mm, 0);
>>>>>     	if (!(vma->vm_flags & VM_GROWSUP))
>>>>>     		return -EFAULT;
>>>>> +	vma_iter_config(&vmi, vma->vm_start, address);
>>> This is confusing.  I think you are doing this so that the vma iterator
>>> is set up the same as the maple state, and not what is logically
>>> necessary?
>>
>> Yes, VMA_ITERATOR can only pass one address.
>>
>>>>>     	/* Guard against exceeding limits of the address space. */
>>>>>     	address &= PAGE_MASK;
>>>>>     	if (address >= (TASK_SIZE & PAGE_MASK))
>>>>> @@ -1985,15 +1985,15 @@ static int expand_upwards(struct vm_area_struct *vma, unsigned long address)
>>>>>     	}
>>>>>     	if (next)
>>>>> -		mas_prev_range(&mas, address);
>>>>> +		mas_prev_range(&vmi.mas, address);
>>> This isn't really hiding the maple state.
>>
>> Okay,  I will create a new helper function for this in the mm/internal.h.
>>
>>>
>>>>> -	__mas_set_range(&mas, vma->vm_start, address - 1);
>>>>> -	if (mas_preallocate(&mas, vma, GFP_KERNEL))
>>>>> +	vma_iter_config(&vmi, vma->vm_start, address);
>>> The above maple state changes is to get the maple state to point to the
>>> correct area for the preallocation call below.  This seems unnecessary
>>> to me.
>>>
>>> We really should just set it up correctly.  Unfortunately, with the VMA
>>> iterator, that's not really possible on initialization.
>>>
>>> What we can do is use the vma->vm_start for the initialization, then use
>>> vma_iter_config() here.  That will not reset any state - but that's fine
>>> because the preallocation is the first call that actually uses it
>>> anyways.
>>>
>>> So we can initialize with vma->vm_start, don't call vma_iter_config
>>> until here, and also drop the if (next) part.
>>>
>>> This is possible here because it's not optimised like the
>>> expand_upwards() case, which uses the state to check prev and avoids an
>>> extra walk.
>>>
>>> Please make sure to test with the ltp tests on the stack combining, etc
>>> on a platform that expands down.


It seems something wrong about this description. This change is in 
expand_upwards(), but not in

expand_downwards(). So we should test it on a platform that expands up. 
And drop the if (next) part

is unnecessary. Did I get that right?

>>
>> Okay, I will test it.
> Testing this can be tricky.  Thanks for looking at it.
>
> ...
>>>>>     	mmap_write_lock(mm);
>>>>>     	mt_clear_in_rcu(&mm->mm_mt);
>>>>> -	mas_set(&mas, vma->vm_end);
>>>>> -	free_pgtables(&tlb, &mas, vma, FIRST_USER_ADDRESS,
>>>>> +	vma_iter_set(&vmi, vma->vm_end);
>>>>> +	free_pgtables(&tlb, &vmi.mas, vma, FIRST_USER_ADDRESS,
>>>>>     		      USER_PGTABLES_CEILING, true);
>>> I guess the page tables still deal with the maple state directly then.
>>
>> Yes.
> That's okay, we can leave that for another time.  I believe Peng
> complicated the internals of free_pgtables() with his forking
> optimisation so care will need to be taken and should probably be done
> in another patch, another time.  In fact, hiding xa_is_zero() within the
> vma iterator is going to be a really good thing to do, if performance
> doesn't suffer.
>
> Just don't state we are removing the use of maple state in the change
> log - since we do pass it through sometimes.
>
> ...
>
> Thanks again,
> Liam

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ