linux-kernel - Re: [PATCH v6 2/3] mm/rmap: integrate PMD-mapped folio splitting into pagewalk loop

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c1975e82-3442-441a-a9ce-c612c2d3f70f@redhat.com>
Date: Wed, 5 Jun 2024 18:16:43 +0200
From: David Hildenbrand <david@...hat.com>
To: Lance Yang <ioworker0@...il.com>
Cc: Yin Fengwei <fengwei.yin@...el.com>, akpm@...ux-foundation.org,
 willy@...radead.org, sj@...nel.org, baolin.wang@...ux.alibaba.com,
 maskray@...gle.com, ziy@...dia.com, ryan.roberts@....com, 21cnbao@...il.com,
 mhocko@...e.com, zokeefe@...gle.com, shy828301@...il.com,
 xiehuan09@...il.com, libang.li@...group.com, wangkefeng.wang@...wei.com,
 songmuchun@...edance.com, peterx@...hat.com, minchan@...nel.org,
 linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v6 2/3] mm/rmap: integrate PMD-mapped folio splitting into
 pagewalk loop

On 05.06.24 17:43, Lance Yang wrote:
> On Wed, Jun 5, 2024 at 11:03 PM David Hildenbrand <david@...hat.com> wrote:
>>
>> On 05.06.24 16:57, Lance Yang wrote:
>>> On Wed, Jun 5, 2024 at 10:39 PM David Hildenbrand <david@...hat.com> wrote:
>>>>
>>>> On 05.06.24 16:28, David Hildenbrand wrote:
>>>>> On 05.06.24 16:20, Lance Yang wrote:
>>>>>> Hi David,
>>>>>>
>>>>>> On Wed, Jun 5, 2024 at 8:46 PM David Hildenbrand <david@...hat.com> wrote:
>>>>>>>
>>>>>>> On 21.05.24 06:02, Lance Yang wrote:
>>>>>>>> In preparation for supporting try_to_unmap_one() to unmap PMD-mapped
>>>>>>>> folios, start the pagewalk first, then call split_huge_pmd_address() to
>>>>>>>> split the folio.
>>>>>>>>
>>>>>>>> Since TTU_SPLIT_HUGE_PMD will no longer perform immediately, we might
>>>>>>>> encounter a PMD-mapped THP missing the mlock in the VM_LOCKED range during
>>>>>>>> the page walk. It’s probably necessary to mlock this THP to prevent it from
>>>>>>>> being picked up during page reclaim.
>>>>>>>>
>>>>>>>> Suggested-by: David Hildenbrand <david@...hat.com>
>>>>>>>> Suggested-by: Baolin Wang <baolin.wang@...ux.alibaba.com>
>>>>>>>> Signed-off-by: Lance Yang <ioworker0@...il.com>
>>>>>>>> ---
>>>>>>>
>>>>>>> [...] again, sorry for the late review.
>>>>>>
>>>>>> No worries at all, thanks for taking time to review!
>>>>>>
>>>>>>>
>>>>>>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>>>>>>> index ddffa30c79fb..08a93347f283 100644
>>>>>>>> --- a/mm/rmap.c
>>>>>>>> +++ b/mm/rmap.c
>>>>>>>> @@ -1640,9 +1640,6 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>>>>>>>>           if (flags & TTU_SYNC)
>>>>>>>>                   pvmw.flags = PVMW_SYNC;
>>>>>>>>
>>>>>>>> -     if (flags & TTU_SPLIT_HUGE_PMD)
>>>>>>>> -             split_huge_pmd_address(vma, address, false, folio);
>>>>>>>> -
>>>>>>>>           /*
>>>>>>>>            * For THP, we have to assume the worse case ie pmd for invalidation.
>>>>>>>>            * For hugetlb, it could be much worse if we need to do pud
>>>>>>>> @@ -1668,20 +1665,35 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>>>>>>>>           mmu_notifier_invalidate_range_start(&range);
>>>>>>>>
>>>>>>>>           while (page_vma_mapped_walk(&pvmw)) {
>>>>>>>> -             /* Unexpected PMD-mapped THP? */
>>>>>>>> -             VM_BUG_ON_FOLIO(!pvmw.pte, folio);
>>>>>>>> -
>>>>>>>>                   /*
>>>>>>>>                    * If the folio is in an mlock()d vma, we must not swap it out.
>>>>>>>>                    */
>>>>>>>>                   if (!(flags & TTU_IGNORE_MLOCK) &&
>>>>>>>>                       (vma->vm_flags & VM_LOCKED)) {
>>>>>>>>                           /* Restore the mlock which got missed */
>>>>>>>> -                     if (!folio_test_large(folio))
>>>>>>>> +                     if (!folio_test_large(folio) ||
>>>>>>>> +                         (!pvmw.pte && (flags & TTU_SPLIT_HUGE_PMD)))
>>>>>>>>                                   mlock_vma_folio(folio, vma);
>>>>>>>
>>>>>>> Can you elaborate why you think this would be required? If we would have
>>>>>>> performed the  split_huge_pmd_address() beforehand, we would still be
>>>>>>> left with a large folio, no?
>>>>>>
>>>>>> Yep, there would still be a large folio, but it wouldn't be PMD-mapped.
>>>>>>
>>>>>> After Weifeng's series[1], the kernel supports mlock for PTE-mapped large
>>>>>> folio, but there are a few scenarios where we don't mlock a large folio, such
>>>>>> as when it crosses a VM_LOCKed VMA boundary.
>>>>>>
>>>>>>      -                     if (!folio_test_large(folio))
>>>>>>      +                     if (!folio_test_large(folio) ||
>>>>>>      +                         (!pvmw.pte && (flags & TTU_SPLIT_HUGE_PMD)))
>>>>>>
>>>>>> And this check is just future-proofing and likely unnecessary. If encountering a
>>>>>> PMD-mapped THP missing the mlock for some reason, we can mlock this
>>>>>> THP to prevent it from being picked up during page reclaim, since it is fully
>>>>>> mapped and doesn't cross the VMA boundary, IIUC.
>>>>>>
>>>>>> What do you think?
>>>>>> I would appreciate any suggestions regarding this check ;)
>>>>>
>>>>> Reading this patch only, I wonder if this change makes sense in the
>>>>> context here.
>>>>>
>>>>> Before this patch, we would have PTE-mapped the PMD-mapped THP before
>>>>> reaching this call and skipped it due to "!folio_test_large(folio)".
>>>>>
>>>>> After this patch, we either
>>>>>
>>>>> a) PTE-remap the THP after this check, but retry and end-up here again,
>>>>> whereby we would skip it due to "!folio_test_large(folio)".
>>>>>
>>>>> b) Discard the PMD-mapped THP due to lazyfree directly. Can that
>>>>> co-exist with mlock and what would be the problem here with mlock?
>>>>>
>>>>>
>>>
>>> Thanks a lot for clarifying!
>>>
>>>>> So if the check is required in this patch, we really have to understand
>>>>> why. If not, we should better drop it from this patch.
>>>>>
>>>>> At least my opinion, still struggling to understand why it would be
>>>>> required (I have 0 knowledge about mlock interaction with large folios :) ).
>>>>>
>>>>
>>>> Looking at that series, in folio_references_one(), we do
>>>>
>>>>                           if (!folio_test_large(folio) || !pvmw.pte) {
>>>>                                   /* Restore the mlock which got missed */
>>>>                                   mlock_vma_folio(folio, vma);
>>>>                                   page_vma_mapped_walk_done(&pvmw);
>>>>                                   pra->vm_flags |= VM_LOCKED;
>>>>                                   return false; /* To break the loop */
>>>>                           }
>>>>
>>>> I wonder if we want that here as well now: in case of lazyfree we
>>>> would not back off, right?
>>>>
>>>> But I'm not sure if lazyfree in mlocked areas are even possible.
>>>>
>>>> Adding the "!pvmw.pte" would be much clearer to me than the flag check.
>>>
>>> Hmm... How about we drop it from this patch for now, and add it back if needed
>>> in the future?
>>
>> If we can rule out that MADV_FREE + mlock() keeps working as expected in
>> the PMD-mapped case, we're good.
>>
>> Can we rule that out? (especially for MADV_FREE followed by mlock())
> 
> Perhaps we don't worry about that.
> 
> IIUC, without that check, MADV_FREE + mlock() still works as expected in
> the PMD-mapped case, since if encountering a large folio in a VM_LOCKED
> VMA range, we will stop the page walk immediately.


Can you point me at the code (especially considering patch #3?)

-- 
Cheers,

David / dhildenb