lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7f2ab112-5916-422c-b29f-343cc0d6d754@redhat.com>
Date: Wed, 5 Jun 2024 17:02:52 +0200
From: David Hildenbrand <david@...hat.com>
To: Lance Yang <ioworker0@...il.com>
Cc: Yin Fengwei <fengwei.yin@...el.com>, akpm@...ux-foundation.org,
 willy@...radead.org, sj@...nel.org, baolin.wang@...ux.alibaba.com,
 maskray@...gle.com, ziy@...dia.com, ryan.roberts@....com, 21cnbao@...il.com,
 mhocko@...e.com, zokeefe@...gle.com, shy828301@...il.com,
 xiehuan09@...il.com, libang.li@...group.com, wangkefeng.wang@...wei.com,
 songmuchun@...edance.com, peterx@...hat.com, minchan@...nel.org,
 linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v6 2/3] mm/rmap: integrate PMD-mapped folio splitting into
 pagewalk loop

On 05.06.24 16:57, Lance Yang wrote:
> On Wed, Jun 5, 2024 at 10:39 PM David Hildenbrand <david@...hat.com> wrote:
>>
>> On 05.06.24 16:28, David Hildenbrand wrote:
>>> On 05.06.24 16:20, Lance Yang wrote:
>>>> Hi David,
>>>>
>>>> On Wed, Jun 5, 2024 at 8:46 PM David Hildenbrand <david@...hat.com> wrote:
>>>>>
>>>>> On 21.05.24 06:02, Lance Yang wrote:
>>>>>> In preparation for supporting try_to_unmap_one() to unmap PMD-mapped
>>>>>> folios, start the pagewalk first, then call split_huge_pmd_address() to
>>>>>> split the folio.
>>>>>>
>>>>>> Since TTU_SPLIT_HUGE_PMD will no longer perform immediately, we might
>>>>>> encounter a PMD-mapped THP missing the mlock in the VM_LOCKED range during
>>>>>> the page walk. It’s probably necessary to mlock this THP to prevent it from
>>>>>> being picked up during page reclaim.
>>>>>>
>>>>>> Suggested-by: David Hildenbrand <david@...hat.com>
>>>>>> Suggested-by: Baolin Wang <baolin.wang@...ux.alibaba.com>
>>>>>> Signed-off-by: Lance Yang <ioworker0@...il.com>
>>>>>> ---
>>>>>
>>>>> [...] again, sorry for the late review.
>>>>
>>>> No worries at all, thanks for taking time to review!
>>>>
>>>>>
>>>>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>>>>> index ddffa30c79fb..08a93347f283 100644
>>>>>> --- a/mm/rmap.c
>>>>>> +++ b/mm/rmap.c
>>>>>> @@ -1640,9 +1640,6 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>>>>>>          if (flags & TTU_SYNC)
>>>>>>                  pvmw.flags = PVMW_SYNC;
>>>>>>
>>>>>> -     if (flags & TTU_SPLIT_HUGE_PMD)
>>>>>> -             split_huge_pmd_address(vma, address, false, folio);
>>>>>> -
>>>>>>          /*
>>>>>>           * For THP, we have to assume the worse case ie pmd for invalidation.
>>>>>>           * For hugetlb, it could be much worse if we need to do pud
>>>>>> @@ -1668,20 +1665,35 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>>>>>>          mmu_notifier_invalidate_range_start(&range);
>>>>>>
>>>>>>          while (page_vma_mapped_walk(&pvmw)) {
>>>>>> -             /* Unexpected PMD-mapped THP? */
>>>>>> -             VM_BUG_ON_FOLIO(!pvmw.pte, folio);
>>>>>> -
>>>>>>                  /*
>>>>>>                   * If the folio is in an mlock()d vma, we must not swap it out.
>>>>>>                   */
>>>>>>                  if (!(flags & TTU_IGNORE_MLOCK) &&
>>>>>>                      (vma->vm_flags & VM_LOCKED)) {
>>>>>>                          /* Restore the mlock which got missed */
>>>>>> -                     if (!folio_test_large(folio))
>>>>>> +                     if (!folio_test_large(folio) ||
>>>>>> +                         (!pvmw.pte && (flags & TTU_SPLIT_HUGE_PMD)))
>>>>>>                                  mlock_vma_folio(folio, vma);
>>>>>
>>>>> Can you elaborate why you think this would be required? If we would have
>>>>> performed the  split_huge_pmd_address() beforehand, we would still be
>>>>> left with a large folio, no?
>>>>
>>>> Yep, there would still be a large folio, but it wouldn't be PMD-mapped.
>>>>
>>>> After Weifeng's series[1], the kernel supports mlock for PTE-mapped large
>>>> folio, but there are a few scenarios where we don't mlock a large folio, such
>>>> as when it crosses a VM_LOCKed VMA boundary.
>>>>
>>>>     -                     if (!folio_test_large(folio))
>>>>     +                     if (!folio_test_large(folio) ||
>>>>     +                         (!pvmw.pte && (flags & TTU_SPLIT_HUGE_PMD)))
>>>>
>>>> And this check is just future-proofing and likely unnecessary. If encountering a
>>>> PMD-mapped THP missing the mlock for some reason, we can mlock this
>>>> THP to prevent it from being picked up during page reclaim, since it is fully
>>>> mapped and doesn't cross the VMA boundary, IIUC.
>>>>
>>>> What do you think?
>>>> I would appreciate any suggestions regarding this check ;)
>>>
>>> Reading this patch only, I wonder if this change makes sense in the
>>> context here.
>>>
>>> Before this patch, we would have PTE-mapped the PMD-mapped THP before
>>> reaching this call and skipped it due to "!folio_test_large(folio)".
>>>
>>> After this patch, we either
>>>
>>> a) PTE-remap the THP after this check, but retry and end-up here again,
>>> whereby we would skip it due to "!folio_test_large(folio)".
>>>
>>> b) Discard the PMD-mapped THP due to lazyfree directly. Can that
>>> co-exist with mlock and what would be the problem here with mlock?
>>>
>>>
> 
> Thanks a lot for clarifying!
> 
>>> So if the check is required in this patch, we really have to understand
>>> why. If not, we should better drop it from this patch.
>>>
>>> At least my opinion, still struggling to understand why it would be
>>> required (I have 0 knowledge about mlock interaction with large folios :) ).
>>>
>>
>> Looking at that series, in folio_references_one(), we do
>>
>>                          if (!folio_test_large(folio) || !pvmw.pte) {
>>                                  /* Restore the mlock which got missed */
>>                                  mlock_vma_folio(folio, vma);
>>                                  page_vma_mapped_walk_done(&pvmw);
>>                                  pra->vm_flags |= VM_LOCKED;
>>                                  return false; /* To break the loop */
>>                          }
>>
>> I wonder if we want that here as well now: in case of lazyfree we
>> would not back off, right?
>>
>> But I'm not sure if lazyfree in mlocked areas are even possible.
>>
>> Adding the "!pvmw.pte" would be much clearer to me than the flag check.
> 
> Hmm... How about we drop it from this patch for now, and add it back if needed
> in the future?

If we can rule out that MADV_FREE + mlock() keeps working as expected in 
the PMD-mapped case, we're good.

Can we rule that out? (especially for MADV_FREE followed by mlock())

-- 
Cheers,

David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ