lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <482fb9e1-7def-48c4-b6b2-e3a5ea2ac36e@arm.com>
Date: Thu, 18 Sep 2025 16:40:30 +0530
From: Dev Jain <dev.jain@....com>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
 Lance Yang <lance.yang@...ux.dev>
Cc: akpm@...ux-foundation.org, david@...hat.com, ziy@...dia.com,
 baolin.wang@...ux.alibaba.com, Liam.Howlett@...cle.com, npache@...hat.com,
 ryan.roberts@....com, baohua@...nel.org, ioworker0@...il.com,
 kirill@...temov.name, hughd@...gle.com, mpenttil@...hat.com,
 linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH mm-new v2 2/2] mm/khugepaged: abort collapse scan on guard
 PTEs


On 18/09/25 3:36 pm, Lorenzo Stoakes wrote:
> On Thu, Sep 18, 2025 at 04:11:21PM +0800, Lance Yang wrote:
>>
>> On 2025/9/18 15:37, Dev Jain wrote:
>>> On 18/09/25 10:34 am, Lance Yang wrote:
>>>> From: Lance Yang <lance.yang@...ux.dev>
>>>>
>>>> Guard PTE markers are installed via MADV_GUARD_INSTALL to create
>>>> lightweight guard regions.
>>>>
>>>> Currently, any collapse path (khugepaged or MADV_COLLAPSE) will fail when
>>>> encountering such a range.
>>>>
>>>> MADV_COLLAPSE fails deep inside the collapse logic when trying to swap-in
>>>> the special marker in __collapse_huge_page_swapin().
>>>>
>>>> hpage_collapse_scan_pmd()
>>>>    `- collapse_huge_page()
>>>>        `- __collapse_huge_page_swapin() -> fails!
>>>>
>>>> khugepaged's behavior is slightly different due to its max_ptes_swap
>>>> limit
>>>> (default 64). It won't fail as deep, but it will still needlessly scan up
>>>> to 64 swap entries before bailing out.
>>>>
>>>> IMHO, we can and should detect this much earlier.
>>>>
>>>> This patch adds a check directly inside the PTE scan loop. If a guard
>>>> marker is found, the scan is aborted immediately with
>>>> SCAN_PTE_NON_PRESENT,
>>>> avoiding wasted work.
>>>>
>>>> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
>>>> Signed-off-by: Lance Yang <lance.yang@...ux.dev>
>>>> ---
>>>>    mm/khugepaged.c | 10 ++++++++++
>>>>    1 file changed, 10 insertions(+)
>>>>
>>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>>> index 9ed1af2b5c38..70ebfc7c1f3e 100644
>>>> --- a/mm/khugepaged.c
>>>> +++ b/mm/khugepaged.c
>>>> @@ -1306,6 +1306,16 @@ static int hpage_collapse_scan_pmd(struct
>>>> mm_struct *mm,
>>>>                        result = SCAN_PTE_UFFD_WP;
>>>>                        goto out_unmap;
>>>>                    }
>>>> +                /*
>>>> +                 * Guard PTE markers are installed by
>>>> +                 * MADV_GUARD_INSTALL. Any collapse path must
>>>> +                 * not touch them, so abort the scan immediately
>>>> +                 * if one is found.
>>>> +                 */
>>>> +                if (is_guard_pte_marker(pteval)) {
>>>> +                    result = SCAN_PTE_NON_PRESENT;
>>>> +                    goto out_unmap;
>>>> +                }
>>>>                    continue;
>>>>                } else {
>>>>                    result = SCAN_EXCEED_SWAP_PTE;
>>>>
>>>>
>>> I would like to hear everyone else's thoughts on
>>> https://lore.kernel.org/linux-mm/750a06dc-db3d-43c6-
>>> b234-95efb393a9df@....com/
>>> wherein I suggest that we should not continue to try collapsing other
>>> regions
>>> but immediately exit. The SCAN_PTE_NON_PRESENT case does not exit.
>> Yes! Let's hear what other folks think on that[1].
>>
>> [1] https://lore.kernel.org/linux-mm/c9d4d761-202f-48ce-8e3d-fb9075671ff3@linux.dev
> Since the code has changed let's discuss on this thread.
>
> Dev - You can have guard regions in a range that prevent one PMD from being
> collapsed, I'm struggling to understand why you'd want to abort the whole
> thing?
>
> Your reasoning there isn't clear at all, so if I had a guard region in one
> page in a giant range I was trying to collapse, you're saying we should
> just abort the whole thing?

My reasoning was that it doesn't seem correct that the user will operate in
any capacity on a guard region when it knows it is a guard region.

But, as you say, we then won't be able to collapse a large region in one go
and will have to do multiple madvise() calls to prevent overlapping with a
guard region. So I agree with you.

>
> I really don't understand why we would do that? You just skip over what you
> can't collapse right?
>
> There's no reason at all to assume that overlapping regions here matter, we
> can't predict how users will use this.

True.

>
> As Lance says, it's best effort. And also note we already do this with UFFD
> WP. And note this is also a non-present, PTE marker.
>
> And also this would change existing behaviour which treats this as a swap
> entry then just fails later down the line right?
>
> So yeah I don't agree, I think it's fine as is, unless I'm missing
> something here.

Thanks for your explanation!

>
> Cheers, Lorenzo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ