[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAa6QmQY=0J=L2=NaYfwHeqV=JtknA2wwPvNJBvWreq5GXXv-g@mail.gmail.com>
Date: Wed, 17 Sep 2025 06:56:06 -0700
From: "Zach O'Keefe" <zokeefe@...gle.com>
To: Kiryl Shutsemau <kirill@...temov.name>
Cc: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, Andrew Morton <akpm@...ux-foundation.org>,
David Hildenbrand <david@...hat.com>, Zi Yan <ziy@...dia.com>,
Baolin Wang <baolin.wang@...ux.alibaba.com>, "Liam R. Howlett" <Liam.Howlett@...cle.com>,
Nico Pache <npache@...hat.com>, Ryan Roberts <ryan.roberts@....com>, Dev Jain <dev.jain@....com>,
Barry Song <baohua@...nel.org>, linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCHv2] mm/khugepaged: Do not fail collapse_pte_mapped_thp() on SCAN_PMD_NULL
On Wed, Sep 17, 2025 at 3:52 AM Kiryl Shutsemau <kirill@...temov.name> wrote:
>
> On Tue, Sep 16, 2025 at 11:06:30AM -0700, Zach O'Keefe wrote:
> > So, since we are trying to aim for consistency here, I think we ought
> > to also support the anonymous case.
> >
> > I don't have a patch, but can spot at least two things we'd need to adjust:
> >
> > First, we are defeated by the check in __thp_vma_allowable_orders();
> >
> > /*
> > * THPeligible bit of smaps should show 1 for proper VMAs even
> > * though anon_vma is not initialized yet.
> > *
> > * Allow page fault since anon_vma may be not initialized until
> > * the first page fault.
> > */
> > if (!vma->anon_vma)
> > return (smaps || in_pf) ? orders : 0;
> >
> > I think we can probably just delete that check, but would need to confirm.
>
> Do you want MADV_COLLAPSE to work on VMAs that never got a page fault?
>
> I think it should be fine as long as we agree that MADV_COLLAPSE implies
> memory population. I think it should, but I want to be sure we are on
> the same page.
Exactly. I'm always a little embarrassed when telling people about how
to successfully use MADV_COLLAPSE, "oh, but makes sure you fault at
least one page beforehand because of ~reasons~"
> I also brings a question on holes in the files on MADV_COLLAPSE. We
> might want to populate them too. But it means the logic between
> MADV_COLLAPSE and khugepaged will diverge. It requires larger
> refactoring.
Yeah, and taking a look more thorough am perhaps reminded why I didn't
pursue this yet.
> > And second, madvise_collapse() doesn't route SCAN_PMD_NULL to
> > collapse_pte_mapped_thp(). I think we just need to audit places where
> > we return this code, to make sure it's faithfully describing a
> > situation where we can go ahead and install a new pmd. As a hasty
> > check, the return codes in check_pmd_state() don't look to follow
> > that, with !present and pmd_bad() returning SCAN_PMD_NULL. Likewise,
> > there are many underlying failure reasons for
> > pte_offset_map_ro_nolock()=>___pte_offset_map() that aren't "no PMD
> > entry".
>
> Sounds like a plan :)
:) Frankly, I don't have cycles to tackle this at the moment, and
unfair to push the work on you, given it's non-trivial, so can have my
Reviewed-by: Zach O'Keefe <zokeefe@...gle.com>
For this patch ; though Andrew has already taken it
Hopefully I can look and sneak improvements into 6.18 -- but wouldn't
hold my breath.
> --
> Kiryl Shutsemau / Kirill A. Shutemov
Powered by blists - more mailing lists