[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a4b053c7-6eb0-41fd-bea0-7231300199cb@lucifer.local>
Date: Thu, 18 Sep 2025 13:27:30 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Kiryl Shutsemau <kirill@...temov.name>
Cc: Zach O'Keefe <zokeefe@...gle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
David Hildenbrand <david@...hat.com>, Zi Yan <ziy@...dia.com>,
Baolin Wang <baolin.wang@...ux.alibaba.com>,
"Liam R. Howlett" <Liam.Howlett@...cle.com>,
Nico Pache <npache@...hat.com>, Ryan Roberts <ryan.roberts@....com>,
Dev Jain <dev.jain@....com>, Barry Song <baohua@...nel.org>,
linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCHv2] mm/khugepaged: Do not fail collapse_pte_mapped_thp()
on SCAN_PMD_NULL
On Wed, Sep 17, 2025 at 11:52:34AM +0100, Kiryl Shutsemau wrote:
> On Tue, Sep 16, 2025 at 11:06:30AM -0700, Zach O'Keefe wrote:
> > So, since we are trying to aim for consistency here, I think we ought
> > to also support the anonymous case.
> >
> > I don't have a patch, but can spot at least two things we'd need to adjust:
> >
> > First, we are defeated by the check in __thp_vma_allowable_orders();
> >
> > /*
> > * THPeligible bit of smaps should show 1 for proper VMAs even
> > * though anon_vma is not initialized yet.
> > *
> > * Allow page fault since anon_vma may be not initialized until
> > * the first page fault.
> > */
> > if (!vma->anon_vma)
> > return (smaps || in_pf) ? orders : 0;
> >
> > I think we can probably just delete that check, but would need to confirm.
>
> Do you want MADV_COLLAPSE to work on VMAs that never got a page fault?
>
> I think it should be fine as long as we agree that MADV_COLLAPSE implies
> memory population. I think it should, but I want to be sure we are on
> the same page.
I'm definitely in favour of MADV_COLLAPSE not requiring pre-faulting like
that.
As long as nothing is assuming that an anon_vma already exists, which
surely is not possible as we ignore in smaps/pf case.
(Oh how I hate how anon_vma works)
Though would this not also impact khugepaged?
thp_vma_allowable_order() called with TVA_KHUGEPAGED are in:
- khugepaged_enter_vma() - does't matter really, one VMA having anon_vma or
not should be immaterial to adding the mm to khugepaged, not hugely
likely _none_ would have anon_vma...
- khugepaged_scan_mm_slot() - well, we will require at least 1 PTE for this
to succeed anyway right? We might waste some time though. But probably
not much. hpage_collapse_scan_pmd() -> collapse_huge_page() tries an
anon VMA lock, but would have to have found PTEs to succeed...
So actually it is fine to just remove the check.
>
> I also brings a question on holes in the files on MADV_COLLAPSE. We
> might want to populate them too. But it means the logic between
> MADV_COLLAPSE and khugepaged will diverge. It requires larger
> refactoring.
I think all of THP requires a larger refactoring :) but interesting point.
>
> > And second, madvise_collapse() doesn't route SCAN_PMD_NULL to
> > collapse_pte_mapped_thp(). I think we just need to audit places where
> > we return this code, to make sure it's faithfully describing a
> > situation where we can go ahead and install a new pmd. As a hasty
> > check, the return codes in check_pmd_state() don't look to follow
> > that, with !present and pmd_bad() returning SCAN_PMD_NULL. Likewise,
> > there are many underlying failure reasons for
> > pte_offset_map_ro_nolock()=>___pte_offset_map() that aren't "no PMD
> > entry".
>
> Sounds like a plan :)
There being differences is kind of horrible. So yes indeed, one to look at
:)
>
> --
> Kiryl Shutsemau / Kirill A. Shutemov
Cheers, Lorenzo
Powered by blists - more mailing lists