[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ioo6sjwlznvfmv7kupubkqk6qc6lec7kczius7g27o4kpp3z5p@druouu5ziylf>
Date: Mon, 15 Sep 2025 13:01:26 +0100
From: Kiryl Shutsemau <kas@...nel.org>
To: David Hildenbrand <david@...hat.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
Nico Pache <npache@...hat.com>, linux-mm@...ck.org,
linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-trace-kernel@...r.kernel.org, ziy@...dia.com,
baolin.wang@...ux.alibaba.com, Liam.Howlett@...cle.com,
ryan.roberts@....com, dev.jain@....com, corbet@....net,
rostedt@...dmis.org, mhiramat@...nel.org,
mathieu.desnoyers@...icios.com, akpm@...ux-foundation.org,
baohua@...nel.org, willy@...radead.org, peterx@...hat.com,
wangkefeng.wang@...wei.com, usamaarif642@...il.com,
sunnanyong@...wei.com, vishal.moola@...il.com,
thomas.hellstrom@...ux.intel.com, yang@...amperecomputing.com,
aarcange@...hat.com, raquini@...hat.com, anshuman.khandual@....com,
catalin.marinas@....com, tiwai@...e.de, will@...nel.org,
dave.hansen@...ux.intel.com, jack@...e.cz, cl@...two.org,
jglisse@...gle.com, surenb@...gle.com, zokeefe@...gle.com,
hannes@...xchg.org, rientjes@...gle.com, mhocko@...e.com,
rdunlap@...radead.org, hughd@...gle.com, richard.weiyang@...il.com,
lance.yang@...ux.dev, vbabka@...e.cz, rppt@...nel.org, jannh@...gle.com,
pfalcato@...e.de
Subject: Re: [PATCH v11 00/15] khugepaged: mTHP support
On Mon, Sep 15, 2025 at 01:45:39PM +0200, David Hildenbrand wrote:
> On 15.09.25 13:35, Lorenzo Stoakes wrote:
> > On Mon, Sep 15, 2025 at 01:29:22PM +0200, David Hildenbrand wrote:
> > > On 15.09.25 13:23, Lorenzo Stoakes wrote:
> > > > On Mon, Sep 15, 2025 at 01:14:32PM +0200, David Hildenbrand wrote:
> > > > > On 15.09.25 13:02, Lorenzo Stoakes wrote:
> > > > > > On Mon, Sep 15, 2025 at 12:52:03PM +0200, David Hildenbrand wrote:
> > > > > > > On 15.09.25 12:43, Lorenzo Stoakes wrote:
> > > > > > > > On Mon, Sep 15, 2025 at 12:22:07PM +0200, David Hildenbrand wrote:
> > > > > > > > >
> > > > > > > > > 0 -> ~100% used (~0% none)
> > > > > > > > > 1 -> ~50% used (~50% none)
> > > > > > > > > 2 -> ~25% used (~75% none)
> > > > > > > > > 3 -> ~12.5% used (~87.5% none)
> > > > > > > > > 4 -> ~11.25% used (~88,75% none)
> > > > > > > > > ...
> > > > > > > > > 10 -> ~0% used (~100% none)
> > > > > > > >
> > > > > > > > Oh and shouldn't this be inverted?
> > > > > > > >
> > > > > > > > 0 eagerness = we eat up all none PTE entries? Isn't that pretty eager? :P
> > > > > > > > 10 eagerness = we aren't eager to eat up none PTE entries at all?
> > > > > > > >
> > > > > > > > Or am I being dumb here?
> > > > > > >
> > > > > > > Good question.
> > > > > > >
> > > > > > > For swappiness it's: 0 -> no swap (conservative)
> > > > > > >
> > > > > > > So intuitively I assumed: 0 -> no pte_none (conservative)
> > > > > > >
> > > > > > > You're the native speaker, so you tell me :)
> > > > > >
> > > > > > To me this is about 'eagerness to consume empty PTE entries' so 10 is more
> > > > > > eager, 0 is not eager at all, i.e. inversion of what you suggest :)
> > > > >
> > > > > Just so we are on the same page: it is about "eagerness to collapse", right?
> > > > >
> > > > > Wouldn't a 0 mean "I am not eager, I will not waste any memory, I am very
> > > > > careful and bail out on any pte_none" vs. 10 meaning "I am very eager, I
> > > > > will collapse no matter what I find in the page table, waste as much memory
> > > > > as I want"?
> > > >
> > > > Yeah, this is my understanding of your scale, or is my understanding also
> > > > inverted? :)
> > > >
> > > > Right now it's:
> > > >
> > > > eagerness max_ptes_none
> > > >
> > > > 0 -> 511
> > > > ...
> > > > 10 -> 0
> > > >
> > > > Right?
> > >
> > > Just so we are on the same page, this is what I had:
> > >
> > > 0 -> ~100% used (~0% none)
> > >
> > > So "0" -> 0 pte_none or 512 used.
> > >
> > > (note the used vs. none)
> >
> > OK right so we're talking about the same thing, I guess?
> >
> > I was confused partly becuase of the scale, becuase weren't people setting
> > this parameter to low values in practice?
> >
> > And now we make it so we have equivalent of:
> >
> > 0 -> 0
> > 1 -> 256
> > 2 -> 384
>
> Ah, there is the problem, that's not what I had in mind.
>
> 0 -> ~100% used (~0% none)
> ...
> 8 -> ~87,5% used (~12.5% none)
> 9 -> ~75% used (~25% none)
> 9 -> ~50% used (~50% none)
> 10 -> ~0% used (~100% none)
>
> Hopefully I didn't mess it up again.
I think this kind of table is fine for initial implementation of the
knob, but we don't want to document it to userspace like this.
I think we want to be strategically ambiguous on what the knob does
exactly, so kernel could evolve the meaning of the knob over time.
We don't want to repeat the problem we have with max_ptes_none which too
prescriptive and got additional meaning with introduction of shrinker.
As kernel evolves, we want ability to adjust the meaning and keep the
knob useful.
--
Kiryl Shutsemau / Kirill A. Shutemov
Powered by blists - more mailing lists