[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <12ec9700-64c2-4404-bdb6-5fbddfb81164@lucifer.local>
Date: Mon, 15 Sep 2025 11:30:08 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Nico Pache <npache@...hat.com>
Cc: linux-mm@...ck.org, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
david@...hat.com, ziy@...dia.com, baolin.wang@...ux.alibaba.com,
Liam.Howlett@...cle.com, ryan.roberts@....com, dev.jain@....com,
corbet@....net, rostedt@...dmis.org, mhiramat@...nel.org,
mathieu.desnoyers@...icios.com, akpm@...ux-foundation.org,
baohua@...nel.org, willy@...radead.org, peterx@...hat.com,
wangkefeng.wang@...wei.com, usamaarif642@...il.com,
sunnanyong@...wei.com, vishal.moola@...il.com,
thomas.hellstrom@...ux.intel.com, yang@...amperecomputing.com,
kas@...nel.org, aarcange@...hat.com, raquini@...hat.com,
anshuman.khandual@....com, catalin.marinas@....com, tiwai@...e.de,
will@...nel.org, dave.hansen@...ux.intel.com, jack@...e.cz,
cl@...two.org, jglisse@...gle.com, surenb@...gle.com,
zokeefe@...gle.com, hannes@...xchg.org, rientjes@...gle.com,
mhocko@...e.com, rdunlap@...radead.org, hughd@...gle.com,
richard.weiyang@...il.com, lance.yang@...ux.dev, vbabka@...e.cz,
rppt@...nel.org, jannh@...gle.com, pfalcato@...e.de
Subject: Re: [PATCH v11 06/15] khugepaged: introduce collapse_max_ptes_none
helper function
On Fri, Sep 12, 2025 at 05:26:03PM -0600, Nico Pache wrote:
> On Fri, Sep 12, 2025 at 7:36 AM Lorenzo Stoakes
> <lorenzo.stoakes@...cle.com> wrote:
> >
> > On Thu, Sep 11, 2025 at 09:28:01PM -0600, Nico Pache wrote:
> > > The current mechanism for determining mTHP collapse scales the
> > > khugepaged_max_ptes_none value based on the target order. This
> > > introduces an undesirable feedback loop, or "creep", when max_ptes_none
> > > is set to a value greater than HPAGE_PMD_NR / 2.
> > >
> > > With this configuration, a successful collapse to order N will populate
> > > enough pages to satisfy the collapse condition on order N+1 on the next
> > > scan. This leads to unnecessary work and memory churn.
> > >
> > > To fix this issue introduce a helper function that caps the max_ptes_none
> > > to HPAGE_PMD_NR / 2 - 1 (255 on 4k page size). The function also scales
> > > the max_ptes_none number by the (PMD_ORDER - target collapse order).
> >
> > I would say very clearly that this is only in the mTHP case.
>
> ack, I stole most of the verbiage here from other notes I've
> previously written, but it can be improved.
Thanks.
>
> >
> >
> > >
> > > Signed-off-by: Nico Pache <npache@...hat.com>
> >
> > Hmm I thought we were going to wait for David to investigate different
> > approaches to this?
> >
> > This is another issue with quickly going to another iteration. Though I do think
> > David explicitly said he'd come back with a solution?
>
> Sorry I thought that was being done in lockstep. The last version was
> about a month ago and I had a lot of changes queued up. Now that we
> have collapse_max_pte_none() David has a much easier entry point to
> work off :)
It'd be less problematic if this was an RFC but better to ensure all discussion
is really complete before next revision for everybody's sanity.
The ideal solution here would be to just ask David if he minded you respinning
before that was resovled I think.
But I do think generally, as I said in [0] that it'd make our lives easier if
you left perhaps a day or two after the conversation has settled just to be sure
that's that, and obviously directly ask about things like this.
I can only politely ask for this, obviously you're free to do whatever... :)
[0]: https://lore.kernel.org/all/2d5270e4-0de3-4ea3-87a4-96254eb6d446@lucifer.local/
>
> I think he will still need this groundwork for the solution he is
> working on with "eagerness". if 10 -> 511, and 9 ->255, ..., 0 -> 0.
> It will still have to do the scaling. Although I believe 0-10 should
> be more like 0-5 mapping to 0,32,64,128,255,511
Yeah, let's leave that discussion to the subthread on that.
>
> >
> > So I'm not sure why we're seeing this solution here? Unless I'm missing
> > something?
> >
> > > ---
> > > mm/khugepaged.c | 22 +++++++++++++++++++++-
> > > 1 file changed, 21 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > > index b0ae0b63fc9b..4587f2def5c1 100644
> > > --- a/mm/khugepaged.c
> > > +++ b/mm/khugepaged.c
> > > @@ -468,6 +468,26 @@ void __khugepaged_enter(struct mm_struct *mm)
> > > wake_up_interruptible(&khugepaged_wait);
> > > }
> > >
> > > +/* Returns the scaled max_ptes_none for a given order.
> >
> > We don't start comments at the /*, please use a normal comment format like:
> ack
Thanks
> >
> > /*
> > * xxxx
> > */
> >
> > > + * Caps the value to HPAGE_PMD_NR/2 - 1 in the case of mTHP collapse to prevent
> >
> > This is super unclear.
> >
> > It start with 'caps the xxx' which seems like you're talking generally.
> >
> > You should say very clearly 'For PMD allocations we apply the
> > khugepaged_max_ptes_none parameter as normal. For mTHP ... [details about mTHP].
> ack I will clean this up.
Thanks.
> >
> > > + * a feedback loop. If max_ptes_none is greater than HPAGE_PMD_NR/2, the value
> > > + * would lead to collapses that introduces 2x more pages than the original
> > > + * number of pages. On subsequent scans, the max_ptes_none check would be
> > > + * satisfied and the collapses would continue until the largest order is reached
> > > + */
> >
> > This is a super vauge explanation. Please describe the issue with creep more
> > clearly.
> ok I will try to come up with something clearer.
Thanks.
> >
> > Also aren't we saying that 511 or 0 are the sensible choices? But now somehow
> > that's not the case?
> Oh I stated I wanted to propose this, and although there was some
> pushback I still thought it deserved another attempt. This still
> allows for some configurability, and with David's eagerness toggle
> this still seems to fit nicely.
> >
> > You're also not giving a kdoc info on what this returns.
> Ok I'll add a kdoc here, why this function in particular, I'm trying
> to understand why we dont add kdocs on other functions?
> >
> > > +static int collapse_max_ptes_none(unsigned int order)
> >
> > It's a problem that existed already, but khugepaged_max_ptes_none is an unsigned
> > int and this returns int.
> >
> > Maybe we should fix this while we're at it...
> ack
Thanks
> >
> > > +{
> > > + int max_ptes_none;
> > > +
> > > + if (order != HPAGE_PMD_ORDER &&
> > > + khugepaged_max_ptes_none >= HPAGE_PMD_NR/2)
> > > + max_ptes_none = HPAGE_PMD_NR/2 - 1;
> > > + else
> > > + max_ptes_none = khugepaged_max_ptes_none;
> > > + return max_ptes_none >> (HPAGE_PMD_ORDER - order);
> > > +
> > > +}
> > > +
> >
> > I really don't like this formulation, you're making it unnecessarily unclear and
> > now, for the super common case of PMD size, you have to figure out 'oh it's this
> > second branch and we're subtracting HPAGE_PMD_ORDER from HPAGE_PMD_ORDER so just
> > return khugepaged_max_ptes_none'. When we could... just return it no?
> >
> > So something like:
> >
> > #define MAX_PTES_NONE_MTHP_CAP (HPAGE_PMD_NR / 2 - 1)
> >
> > static unsigned int collapse_max_ptes_none(unsigned int order)
> > {
> > unsigned int max_ptes_none_pmd;
> >
> > /* PMD-sized THPs behave precisely the same as before. */
> > if (order == HPAGE_PMD_ORDER)
> > return khugepaged_max_ptes_none;
> >
> > /*
> > * Bizarrely, this is expressed in terms of PTEs were this PMD-sized.
> > * For the reasons stated above, we cap this value in the case of mTHP.
> > */
> > max_ptes_none_pmd = MIN(MAX_PTES_NONE_MTHP_CAP,
> > khugepaged_max_ptes_none);
> >
> > /* Apply PMD -> mTHP scaling. */
> > return max_ptes_none >> (HPAGE_PMD_ORDER - order);
> > }
> yeah that's much cleaner thanks!
:) Cool thanks!
> >
> > > void khugepaged_enter_vma(struct vm_area_struct *vma,
> > > vm_flags_t vm_flags)
> > > {
> > > @@ -554,7 +574,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> > > struct folio *folio = NULL;
> > > pte_t *_pte;
> > > int none_or_zero = 0, shared = 0, result = SCAN_FAIL, referenced = 0;
> > > - int scaled_max_ptes_none = khugepaged_max_ptes_none >> (HPAGE_PMD_ORDER - order);
> > > + int scaled_max_ptes_none = collapse_max_ptes_none(order);
> > > const unsigned long nr_pages = 1UL << order;
> > >
> > > for (_pte = pte; _pte < pte + nr_pages;
> > > --
> > > 2.51.0
> > >
> >
> > Thanks, Lorenzo
> >
>
Cheers, Lorenzo
Powered by blists - more mailing lists