[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a3862b3f-ef26-46f8-a09e-5484fe3c495b@bytedance.com>
Date: Fri, 16 Aug 2024 17:21:57 +0800
From: Qi Zheng <zhengqi.arch@...edance.com>
To: David Hildenbrand <david@...hat.com>
Cc: hughd@...gle.com, willy@...radead.org, mgorman@...e.de,
muchun.song@...ux.dev, vbabka@...nel.org, akpm@...ux-foundation.org,
zokeefe@...gle.com, rientjes@...gle.com, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, the arch/x86 maintainers <x86@...nel.org>
Subject: Re: [RFC PATCH v2 1/7] mm: pgtable: make pte_offset_map_nolock()
return pmdval
Hi David,
On 2024/8/16 16:59, David Hildenbrand wrote:
> On 12.08.24 08:21, Qi Zheng wrote:
>> Hi David,
>>
>> On 2024/8/10 00:54, David Hildenbrand wrote:
>>> On 07.08.24 05:08, Qi Zheng wrote:
>>>> Hi David,
>>>>
>>>> On 2024/8/6 22:16, David Hildenbrand wrote:
>>>>> On 06.08.24 04:40, Qi Zheng wrote:
>>>>>> Hi David,
>>>>>>
>>>>>> On 2024/8/5 22:43, David Hildenbrand wrote:
>>>>>>> On 05.08.24 14:55, Qi Zheng wrote:
>>>>>>>> Make pte_offset_map_nolock() return pmdval so that we can
>>>>>>>> recheck the
>>>>>>>> *pmd once the lock is taken. This is a preparation for freeing
>>>>>>>> empty
>>>>>>>> PTE pages, no functional changes are expected.
>>>>>>>
>>>>>>> Skimming the patches, only patch #4 updates one of the callsites
>>>>>>> (collapse_pte_mapped_thp).
>>>>>>
>>>>>> In addition, retract_page_tables() and reclaim_pgtables_pmd_entry()
>>>>>> also used the pmdval returned by pte_offset_map_nolock().
>>>>>
>>>>> Right, and I am questioning if only touching these two is sufficient,
>>>>> and how we can make it clearer when someone actually has to recheck
>>>>> the
>>>>> PMD.
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Wouldn't we have to recheck if the PMD val changed in more cases
>>>>>>> after
>>>>>>> taking the PTL?
>>>>>>>
>>>>>>> If not, would it make sense to have a separate function that
>>>>>>> returns the
>>>>>>> pmdval and we won't have to update each and every callsite?
>>>>>>
>>>>>> pte_offset_map_nolock() had already obtained the pmdval previously,
>>>>>> just
>>>>>> hadn't returned it. And updating those callsite is simple, so I think
>>>>>> there may not be a need to add a separate function.
>>>>>
>>>>> Let me ask this way: why is retract_page_tables() and
>>>>> reclaim_pgtables_pmd_entry() different to the other ones, and how
>>>>> would
>>>>> someone using pte_offset_map_nolock() know what's to do here?
>>>>
>>>> If we acuqire the PTL (PTE or PMD lock) after calling
>>>> pte_offset_map_nolock(), it means we may be modifying the corresponding
>>>> pte or pmd entry. In that case, we need to perform a pmd_same() check
>>>> after holding the PTL, just like in pte_offset_map_lock(), to prevent
>>>> the possibility of the PTE page being reclaimed at that time.
>>>
>>> Okay, what I thought.
>>>
>>>>
>>>> If we call pte_offset_map_nolock() and do not need to acquire the PTL
>>>> afterwards, it means we are only reading the PTE page. In this case,
>>>> the
>>>> rcu_read_lock() in pte_offset_map_nolock() will ensure that the PTE
>>>> page
>>>> cannot be reclaimed.
>>>>
>>>>>
>>>>> IIUC, we must check the PMDVAL after taking the PTL in case
>>>>>
>>>>> (a) we want to modify the page table to turn pte_none() entries to
>>>>> !pte_none(). Because it could be that the page table was
>>>>> removed and
>>>>> now is all pte_none()
>>>>>
>>>>> (b) we want to remove the page table ourselves and want to check if it
>>>>> has already been removed.
>>>>>
>>>>> Is that it?
>>>>
>>>> Yes.
>>>>
>>>>>
>>>>> So my thinking is if another function variant can make that clearer.
>>>>
>>>> OK, how about naming it pte_offset_map_before_lock?
>>>
>>> That's the issue with some of the code: for example in
>>> filemap_fault_recheck_pte_none() we'll call pte_offset_map_nolock() and
>>> conditionally take the PTL. But we won't be modifying the pages tables.
>>>
>>> Maybe something like:
>>>
>>> pte_offset_map_readonly_nolock()
>>>
>>> and
>>>
>>> pte_offset_map_maywrite_nolock()
>>>
>>> The latter would require you to pass the PMD pointer such that you have
>>> to really mess up to ignore what to do with it (check PMD same or not
>>> check PMD same if you really know what you are douing).
>>>
>>> The first would not take a PMD pointer at all, because there is no
>>> need to.
>>
>> These two function names LGTM. Will do in the next version.
>
> That is probably something you can send as a separate patch independent
> of this full series.
I think it makes sense. I am analyzing whether the existing path of
calling pte_offset_map_nolock + spin_lock will be concurrent with
the path of reclaiming PTE pages in THP. If so, it actually needs to
be fixed first.
>
> Then we might also get more review+thoughts from other folks!
I hope so!
Thanks,
Qi
>
Powered by blists - more mailing lists