lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <dcfc7e27-d3c8-4fd0-8b7b-ce8f5051d597@lucifer.local>
Date: Fri, 12 Sep 2025 19:21:49 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: David Hildenbrand <david@...hat.com>
Cc: Kiryl Shutsemau <kas@...nel.org>, Nico Pache <npache@...hat.com>,
        linux-mm@...ck.org, linux-doc@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
        ziy@...dia.com, baolin.wang@...ux.alibaba.com, Liam.Howlett@...cle.com,
        ryan.roberts@....com, dev.jain@....com, corbet@....net,
        rostedt@...dmis.org, mhiramat@...nel.org,
        mathieu.desnoyers@...icios.com, akpm@...ux-foundation.org,
        baohua@...nel.org, willy@...radead.org, peterx@...hat.com,
        wangkefeng.wang@...wei.com, usamaarif642@...il.com,
        sunnanyong@...wei.com, vishal.moola@...il.com,
        thomas.hellstrom@...ux.intel.com, yang@...amperecomputing.com,
        aarcange@...hat.com, raquini@...hat.com, anshuman.khandual@....com,
        catalin.marinas@....com, tiwai@...e.de, will@...nel.org,
        dave.hansen@...ux.intel.com, jack@...e.cz, cl@...two.org,
        jglisse@...gle.com, surenb@...gle.com, zokeefe@...gle.com,
        hannes@...xchg.org, rientjes@...gle.com, mhocko@...e.com,
        rdunlap@...radead.org, hughd@...gle.com, richard.weiyang@...il.com,
        lance.yang@...ux.dev, vbabka@...e.cz, rppt@...nel.org,
        jannh@...gle.com, pfalcato@...e.de
Subject: Re: [PATCH v11 00/15] khugepaged: mTHP support

On Fri, Sep 12, 2025 at 07:53:22PM +0200, David Hildenbrand wrote:
> On 12.09.25 17:51, Lorenzo Stoakes wrote:
> > With all this stuff said, do we have an actual plan for what we intend to do
> > _now_?
>
> Oh no, no I have to use my brain and it's Friday evening.

I apologise :)

>
> >
> > As Nico has implemented a basic solution here that we all seem to agree is not
> > what we want.
> >
> > Without needing special new hardware or major reworks, what would this parameter
> > look like?
> >
> > What would the heuristics be? What about the eagerness scales?
> >
> > I'm but a simple kernel developer,
>
> :)
>
> and interested in simple pragmatic stuff :)
> > do you have a plan right now David?
>
> Ehm, if you ask me that way ...
>
> >
> > Maybe we can start with something simple like a rough percentage per eagerness
> > entry that then gets scaled based on utilisation?
>
> ... I think we should probably:
>
> 1) Start with something very simple for mTHP that doesn't lock us into any particular direction.

Yes.

>
> 2) Add an "eagerness" parameter with fixed scale and use that for mTHP as well

Yes I think we're all pretty onboard with that it seems!

>
> 3) Improve that "eagerness" algorithm using a dynamic scale or #whatever

Right, I feel like we could start with some very simple linear thing here and
later maybe refine it?

>
> 4) Solve world peace and world hunger

Yes! That would be pretty great ;)

>
> 5) Connect it all to memory pressure / reclaim / shrinker / heuristics / hw hotness / #whatever

I think these are TODOs :)

>
>
> I maintain my initial position that just using
>
> max_ptes_none == 511 -> collapse mTHP always
> max_ptes_none != 511 -> collapse mTHP only if we all PTEs are non-none/zero
>
> As a starting point is probably simple and best, and likely leaves room for any
> changes later.

Yes.

>
>
> Of course, we could do what Nico is proposing here, as 1) and change it all later.

Right.

But that does mean for mTHP we're limited to 256 (or 255 was it?) but I guess
given the 'creep' issue that's sensible.

>
> It's just when it comes to documenting all that stuff in patch #15 that I feel like
> "alright, we shouldn't be doing it longterm like that, so let's not make anybody
> depend on any weird behavior here by over-domenting it".
>
> I mean
>
> "
> +To prevent "creeping" behavior where collapses continuously promote to larger
> +orders, if max_ptes_none >= HPAGE_PMD_NR/2 (255 on 4K page size), it is
> +capped to HPAGE_PMD_NR/2 - 1 for mTHP collapses. This is due to the fact
> +that introducing more than half of the pages to be non-zero it will always
> +satisfy the eligibility check on the next scan and the region will be collapse.
> "
>
> Is just way, way to detailed.
>
> I would just say "The kernel might decide to use a more conservative approach
> when collapsing smaller THPs" etc.
>
>
> Thoughts?

Well I've sort of reviewed oppositely there :) well at least that it needs to be
a hell of a lot clearer (I find that comment really compressed and I just don't
really understand it).

I guess I didn't think about people reading that and relying on it, so maybe we
could alternatively make that succinct.

But I think it'd be better to say something like "mTHP collapse cannot currently
correctly function with half or more of the PTE entries empty, so we cap at just
below this level" in this case.

>
> --
> Cheers
>
> David / dhildenb
>

Cheers, Lorenzo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ