linux-kernel - Re: [RFC PATCH 0/3] support large folio for mlock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <362ac9b2-566f-f942-e98a-196ce38b6003@intel.com>
Date:   Sat, 8 Jul 2023 13:35:01 +0800
From:   "Yin, Fengwei" <fengwei.yin@...el.com>
To:     Yu Zhao <yuzhao@...gle.com>
CC:     <linux-mm@...ck.org>, <linux-kernel@...r.kernel.org>,
        <ryan.roberts@....com>, <shy828301@...il.com>,
        <akpm@...ux-foundation.org>, <willy@...radead.org>,
        <david@...hat.com>
Subject: Re: [RFC PATCH 0/3] support large folio for mlock



On 7/8/2023 1:06 PM, Yu Zhao wrote:
> On Fri, Jul 7, 2023 at 11:01 PM Yin, Fengwei <fengwei.yin@...el.com> wrote:
>>
>>
>>
>> On 7/8/2023 12:45 PM, Yu Zhao wrote:
>>> On Fri, Jul 7, 2023 at 10:52 AM Yin Fengwei <fengwei.yin@...el.com> wrote:
>>>>
>>>> Yu mentioned at [1] about the mlock() can't be applied to large folio.
>>>>
>>>> I leant the related code and here is my understanding:
>>>> - For RLIMIT_MEMLOCK related, there is no problem. Becuase the
>>>>   RLIMIT_MEMLOCK statistics is not related underneath page. That means
>>>>   underneath page mlock or munlock doesn't impact the RLIMIT_MEMLOCK
>>>>   statistics collection which is always correct.
>>>>
>>>> - For keeping the page in RAM, there is no problem either. At least,
>>>>   during try_to_unmap_one(), once detect the VMA has VM_LOCKED bit
>>>>   set in vm_flags, the folio will be kept whatever the folio is
>>>>   mlocked or not.
>>>>
>>>> So the function of mlock for large folio works. But it's not optimized
>>>> because the page reclaim needs scan these large folio and may split
>>>> them.
>>>>
>>>> This series identified the large folio for mlock to two types:
>>>>   - The large folio is in VM_LOCKED VMA range
>>>>   - The large folio cross VM_LOCKED VMA boundary
>>>>
>>>> For the first type, we mlock large folio so page relcaim will skip it.
>>>> For the second type, we don't mlock large folio. It's allowed to be
>>>> picked by page reclaim and be split. So the pages not in VM_LOCKED VMA
>>>> range are allowed to be reclaimed/released.
>>>
>>> This is a sound design, which is also what I have in mind. I see the
>>> rationales are being spelled out in this thread, and hopefully
>>> everyone can be convinced.
>>>
>>>> patch1 introduce API to check whether large folio is in VMA range.
>>>> patch2 make page reclaim/mlock_vma_folio/munlock_vma_folio support
>>>> large folio mlock/munlock.
>>>> patch3 make mlock/munlock syscall support large folio.
>>>
>>> Could you tidy up the last patch a little bit? E.g., Saying "mlock the
>>> 4K folio" is obviously not the best idea.
>>>
>>> And if it's possible, make the loop just look like before, i.e.,
>>>
>>>   if (!can_mlock_entire_folio())
>>>     continue;
>>>   if (vma->vm_flags & VM_LOCKED)
>>>     mlock_folio_range();
>>>   else
>>>     munlock_folio_range();
>> This can make large folio mlocked() even user space call munlock()
>> to the range. Considering following case:
>>   1. mlock() 64K range and underneath 64K large folio is mlocked().
>>   2. mprotect the first 32K range to different prot and triggers
>>      VMA split.
>>   3. munlock() 64K range. As 64K large folio doesn't in these two
>>      new VMAs range, it will not be munlocked() and only can be
>>      reclaimed after it's unmapped from two VMAs instead of after
>>      the range is munlocked().
> 
> I understand. I'm asking to factor the code, not to change the logic.
Oh. Sorry. I miss-understood the code piece you showed. Will address
this in coming version. Thanks.


Regards
Yin, Fengwei