[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <3c572565-0b21-4136-b0e0-59a5ed858104@redhat.com>
Date: Wed, 8 Oct 2025 10:18:09 +0200
From: David Hildenbrand <david@...hat.com>
To: Lance Yang <lance.yang@...ux.dev>, akpm@...ux-foundation.org
Cc: lorenzo.stoakes@...cle.com, Liam.Howlett@...cle.com, baohua@...nel.org,
baolin.wang@...ux.alibaba.com, dev.jain@....com, hughd@...gle.com,
ioworker0@...il.com, kirill@...temov.name, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, mpenttil@...hat.com, npache@...hat.com,
ryan.roberts@....com, ziy@...dia.com, richard.weiyang@...il.com
Subject: Re: [PATCH mm-new v3 1/1] mm/khugepaged: abort collapse scan on
non-swap entries
On 08.10.25 05:26, Lance Yang wrote:
> From: Lance Yang <lance.yang@...ux.dev>
>
> Currently, special non-swap entries (like PTE markers) are not caught
> early in hpage_collapse_scan_pmd(), leading to failures deep in the
> swap-in logic.
>
> A function that is called __collapse_huge_page_swapin() and documented
> to "Bring missing pages in from swap" will handle other types as well.
>
> As analyzed by David[1], we could have ended up with the following
> entry types right before do_swap_page():
>
> (1) Migration entries. We would have waited.
> -> Maybe worth it to wait, maybe not. We suspect we don't stumble
> into that frequently such that we don't care. We could always
> unlock this separately later.
>
> (2) Device-exclusive entries. We would have converted to non-exclusive.
> -> See make_device_exclusive(), we cannot tolerate PMD entries and
> have to split them through FOLL_SPLIT_PMD. As popped up during
> a recent discussion, collapsing here is actually
> counter-productive, because the next conversion will PTE-map
> it again.
> -> Ok to not collapse.
>
> (3) Device-private entries. We would have migrated to RAM.
> -> Device-private still does not support THPs, so collapsing right
> now just means that the next device access would split the
> folio again.
> -> Ok to not collapse.
>
> (4) HWPoison entries
> -> Cannot collapse
>
> (5) Markers
> -> Cannot collapse
>
> First, this patch adds an early check for these non-swap entries. If
> any one is found, the scan is aborted immediately with the
> SCAN_PTE_NON_PRESENT result, as Lorenzo suggested[2], avoiding wasted
> work. While at it, convert pte_swp_uffd_wp_any() to pte_swp_uffd_wp()
> since we are in the swap pte branch.
>
> Second, as Wei pointed out[3], we may have a chance to get a non-swap
> entry, since we will drop and re-acquire the mmap lock before
> __collapse_huge_page_swapin(). To handle this, we also add a
> non_swap_entry() check there.
>
> Note that we can unlock later what we really need, and not account it
> towards max_swap_ptes.
>
> [1] https://lore.kernel.org/linux-mm/09eaca7b-9988-41c7-8d6e-4802055b3f1e@redhat.com
> [2] https://lore.kernel.org/linux-mm/7df49fe7-c6b7-426a-8680-dcd55219c8bd@lucifer.local
> [3] https://lore.kernel.org/linux-mm/20251005010511.ysek2nqojebqngf3@master
>
> Acked-by: David Hildenbrand <david@...hat.com>
> Reviewed-by: Wei Yang <richard.weiyang@...il.com>
> Reviewed-by: Dev Jain <dev.jain@....com>
> Suggested-by: David Hildenbrand <david@...hat.com>
> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
> Signed-off-by: Lance Yang <lance.yang@...ux.dev>
> ---
Sorry for not replying earlier to your other mail.
LGTM.
We can always handle migration entries later if this shows up to be a
problem (this time, in a clean way ...) and not count them towards
actual "swap" entries.
--
Cheers
David / dhildenb
Powered by blists - more mailing lists