[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAAa6QmQBmZ-82PwzLao=gO-+1u=GFyPogmVOjAFQ-esVdo9tYQ@mail.gmail.com>
Date: Fri, 24 Oct 2025 06:54:44 -0700
From: "Zach O'Keefe" <zokeefe@...gle.com>
To: Pedro Falcato <pfalcato@...e.de>
Cc: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, David Hildenbrand <david@...hat.com>, 
	"Christoph Lameter (Ampere)" <cl@...two.org>, Nico Pache <npache@...hat.com>, linux-kernel@...r.kernel.org, 
	linux-trace-kernel@...r.kernel.org, linux-mm@...ck.org, 
	linux-doc@...r.kernel.org, ziy@...dia.com, baolin.wang@...ux.alibaba.com, 
	Liam.Howlett@...cle.com, ryan.roberts@....com, dev.jain@....com, 
	corbet@....net, rostedt@...dmis.org, mhiramat@...nel.org, 
	mathieu.desnoyers@...icios.com, akpm@...ux-foundation.org, baohua@...nel.org, 
	willy@...radead.org, peterx@...hat.com, wangkefeng.wang@...wei.com, 
	usamaarif642@...il.com, sunnanyong@...wei.com, vishal.moola@...il.com, 
	thomas.hellstrom@...ux.intel.com, yang@...amperecomputing.com, kas@...nel.org, 
	aarcange@...hat.com, raquini@...hat.com, anshuman.khandual@....com, 
	catalin.marinas@....com, tiwai@...e.de, will@...nel.org, 
	dave.hansen@...ux.intel.com, jack@...e.cz, jglisse@...gle.com, 
	surenb@...gle.com, hannes@...xchg.org, rientjes@...gle.com, mhocko@...e.com, 
	rdunlap@...radead.org, hughd@...gle.com, richard.weiyang@...il.com, 
	lance.yang@...ux.dev, vbabka@...e.cz, rppt@...nel.org, jannh@...gle.com, 
	Bagas Sanjaya <bagasdotme@...il.com>
Subject: Re: [PATCH v12 mm-new 15/15] Documentation: mm: update the admin
 guide for mTHP collapse
On Thu, Oct 23, 2025 at 1:44 AM Pedro Falcato <pfalcato@...e.de> wrote:
>
> On Thu, Oct 23, 2025 at 09:00:10AM +0100, Lorenzo Stoakes wrote:
> > On Wed, Oct 22, 2025 at 10:22:08PM +0200, David Hildenbrand wrote:
> > > On 22.10.25 21:52, Christoph Lameter (Ampere) wrote:
> > > > On Wed, 22 Oct 2025, Nico Pache wrote:
> > > >
> > > > > Currently, madvise_collapse only supports collapsing to PMD-sized THPs +
> > > > > and does not attempt mTHP collapses. +
> > > >
> > > > madvise collapse is frequently used as far as I can tell from the THP
> > > > loads being tested. Could we support madvise collapse for mTHP?
> > >
> > > The big question is still how user space can communicate the desired order,
> > > and how we can not break existing users.
> >
>
> Do we want to let userspace communicate order? It seems like an extremely
> specific thing to do. A more simple&sane semantic could be something like:
> "MADV_COLLAPSE collapses a given [addr, addr+len] range into the highest
> order THP it can/thinks it should.". The implementation details of PMD or
> contpte or <...> are lost by the time we get to userspace.
>
> The man page itself is pretty vaguely written to allow us to do whatever
> we want. It sounds to me that allowing userspace to create arbitrary order
> mTHPs would be another pandora's box we shouldn't get into.
>
> > Yes, and let's go one step at a time, this series still needs careful scrutiny
> > and we need to ensure the _fundamentals_ are in place for khugepaged before we
> > get into MADV_COLLAPSE :)
> >
> > >
> > > So I guess there will definitely be some support to trigger collapse to mTHP
> > > in the future, the big question is through which interface. So it will
> > > happen after this series.
> >
> > Yes.
> >
> > >
> > > Maybe through process_madvise() where we have an additional parameter, I
> > > think that was what people discussed in the past.
> >
> > I wouldn't absolutely love us doing that, given it is a general parameter so
> > would seem applicable to any madvise() option and could lead to confusion, also
> > process_madvise() was originally for cross-process madvise vector operations.
>
> For what it's worth, it would probably not be too hard to devise a generic
> separation there between "generic flags" and "behavior-specific flags".
> And then stuff the desired THP order into MADV_COLLAPSE-specific flags.
Yeah, this is how I envisioned the flags to be leveraged; reserve some
number of bits for generic, and overload the others for
advice-specific. I suspect once the seal is broken on this, more
advice-specific flags will promptly follow.
> >
> > I expanded this to make it applicable to the current process (and introduced
> > PIDFD_SELF to make that more sane), and SJ has optimised it across vector
> > operations (thanks SJ! :), but in general - it seems very weird to have
> > madvise() provide an operation that process_madvise() providse another version
> > of that has an extra parameter.
> >
> > As usual we've painted ourselves into a corner with an API... :)
>
> But yes, I agree it would feel weird.
>
> >
> > Perhaps we'll to accept the process_madvise() compromise and add
> > MADV_COLLAPSE_MHTP that only works with it or something.
> >
> > Of course adding a new syscall isn't impossible... madvise2() not very appealing
> > however...
>
> It is my impression that process_madvise() is already madvise2(), but
> poorly named.
+1
> >
> > TL;DR I guess we'll deal with that when we come to it :)
>
> Amen :)
>
> --
> Pedro
Powered by blists - more mailing lists
 
