[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c62m3tyr6co7jqdrwhtp7exnewhogxtife7g6yh4gve7gqecz6@b5xpocyvifxp>
Date: Thu, 23 Oct 2025 09:44:34 +0100
From: Pedro Falcato <pfalcato@...e.de>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
David Hildenbrand <david@...hat.com>
Cc: "Christoph Lameter (Ampere)" <cl@...two.org>,
Nico Pache <npache@...hat.com>, linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
linux-mm@...ck.org, linux-doc@...r.kernel.org, ziy@...dia.com,
baolin.wang@...ux.alibaba.com, Liam.Howlett@...cle.com, ryan.roberts@....com, dev.jain@....com,
corbet@....net, rostedt@...dmis.org, mhiramat@...nel.org,
mathieu.desnoyers@...icios.com, akpm@...ux-foundation.org, baohua@...nel.org,
willy@...radead.org, peterx@...hat.com, wangkefeng.wang@...wei.com,
usamaarif642@...il.com, sunnanyong@...wei.com, vishal.moola@...il.com,
thomas.hellstrom@...ux.intel.com, yang@...amperecomputing.com, kas@...nel.org, aarcange@...hat.com,
raquini@...hat.com, anshuman.khandual@....com, catalin.marinas@....com,
tiwai@...e.de, will@...nel.org, dave.hansen@...ux.intel.com, jack@...e.cz,
jglisse@...gle.com, surenb@...gle.com, zokeefe@...gle.com, hannes@...xchg.org,
rientjes@...gle.com, mhocko@...e.com, rdunlap@...radead.org, hughd@...gle.com,
richard.weiyang@...il.com, lance.yang@...ux.dev, vbabka@...e.cz, rppt@...nel.org,
jannh@...gle.com, Bagas Sanjaya <bagasdotme@...il.com>
Subject: Re: [PATCH v12 mm-new 15/15] Documentation: mm: update the admin
guide for mTHP collapse
On Thu, Oct 23, 2025 at 09:00:10AM +0100, Lorenzo Stoakes wrote:
> On Wed, Oct 22, 2025 at 10:22:08PM +0200, David Hildenbrand wrote:
> > On 22.10.25 21:52, Christoph Lameter (Ampere) wrote:
> > > On Wed, 22 Oct 2025, Nico Pache wrote:
> > >
> > > > Currently, madvise_collapse only supports collapsing to PMD-sized THPs +
> > > > and does not attempt mTHP collapses. +
> > >
> > > madvise collapse is frequently used as far as I can tell from the THP
> > > loads being tested. Could we support madvise collapse for mTHP?
> >
> > The big question is still how user space can communicate the desired order,
> > and how we can not break existing users.
>
Do we want to let userspace communicate order? It seems like an extremely
specific thing to do. A more simple&sane semantic could be something like:
"MADV_COLLAPSE collapses a given [addr, addr+len] range into the highest
order THP it can/thinks it should.". The implementation details of PMD or
contpte or <...> are lost by the time we get to userspace.
The man page itself is pretty vaguely written to allow us to do whatever
we want. It sounds to me that allowing userspace to create arbitrary order
mTHPs would be another pandora's box we shouldn't get into.
> Yes, and let's go one step at a time, this series still needs careful scrutiny
> and we need to ensure the _fundamentals_ are in place for khugepaged before we
> get into MADV_COLLAPSE :)
>
> >
> > So I guess there will definitely be some support to trigger collapse to mTHP
> > in the future, the big question is through which interface. So it will
> > happen after this series.
>
> Yes.
>
> >
> > Maybe through process_madvise() where we have an additional parameter, I
> > think that was what people discussed in the past.
>
> I wouldn't absolutely love us doing that, given it is a general parameter so
> would seem applicable to any madvise() option and could lead to confusion, also
> process_madvise() was originally for cross-process madvise vector operations.
For what it's worth, it would probably not be too hard to devise a generic
separation there between "generic flags" and "behavior-specific flags".
And then stuff the desired THP order into MADV_COLLAPSE-specific flags.
>
> I expanded this to make it applicable to the current process (and introduced
> PIDFD_SELF to make that more sane), and SJ has optimised it across vector
> operations (thanks SJ! :), but in general - it seems very weird to have
> madvise() provide an operation that process_madvise() providse another version
> of that has an extra parameter.
>
> As usual we've painted ourselves into a corner with an API... :)
But yes, I agree it would feel weird.
>
> Perhaps we'll to accept the process_madvise() compromise and add
> MADV_COLLAPSE_MHTP that only works with it or something.
>
> Of course adding a new syscall isn't impossible... madvise2() not very appealing
> however...
It is my impression that process_madvise() is already madvise2(), but
poorly named.
>
> TL;DR I guess we'll deal with that when we come to it :)
Amen :)
--
Pedro
Powered by blists - more mailing lists