[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f1782ae5-c1d6-4f46-a676-666505990f4d@lucifer.local>
Date: Wed, 25 Jun 2025 08:20:15 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Hugh Dickins <hughd@...gle.com>
Cc: Baolin Wang <baolin.wang@...ux.alibaba.com>, akpm@...ux-foundation.org,
david@...hat.com, ziy@...dia.com, Liam.Howlett@...cle.com,
npache@...hat.com, ryan.roberts@....com, dev.jain@....com,
baohua@...nel.org, zokeefe@...gle.com, shy828301@...il.com,
usamaarif642@...il.com, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 0/2] fix MADV_COLLAPSE issue if THP settings are
disabled
On Tue, Jun 24, 2025 at 10:53:28PM -0700, Hugh Dickins wrote:
> On Wed, 25 Jun 2025, Baolin Wang wrote:
> Sorry for chiming in so late, after so much effort: but I beg you,
> please drop these.
>
> I did not want to get into a fight, and had been hoping a voice of
> reason would come from others, before I got around to responding.
There'll be no fighting :>)
I for one (and I'm absolutely confident others in this thread) very much
respect your opinion - so some civil disagreement is perfectly natural and
healthy.
This is how we get the best outcomes...
>
> And indeed Ryan understood correctly at the start; and he, Usama
> and Barry, perhaps others I've missed, have raised appropriate
> concerns but not prevailed.
>
> If we're sloganeering, I much prefer "never break userspace" to
> "never means never", attractive though that over-simplification is.
It would have been useful to have this discussion earlier indeed... :)
I disagree we're breaking userspace. See below.
>
> Seldom has a feature been so thorougly documented as MADV_COLLAPSE,
> in its 6.1 commits and in the "man 2 madvise" page: which are
> explicit about MADV_COLLAPSE providing a way to get THPs where the
> sysfs setting governing automatic behaviour does not insert them.
I disagree, I feel like the unfortunately poorly named 'never' toggle in
sysfs makes everything uncertain.
One can easily read 'is independent of any sysfs setting... both in terms
of determining THP eligibility, and allocation semantics' as meaning that
it ignores things such as madvise vs. inherit etc. But it's not clear that
it means you ignore 'never' - yes a very poorly chosen name.
I think a reasonable person would interpret 'never' to mean literally
'never'.
But if we go further, note that the man page points to the
Documentation/admin-guide/mm/transhuge.rst document which states:
Global THP controls
-------------------
Transparent Hugepage Support for anonymous memory can be entirely
disabled (mostly for debugging purposes)...
"Entirely disabled".
So I disagree that 'seldom has feature been so thoroughly documented', I
mean in a sense - yes! But also unfortunately, and unintentionally,
vaguely.
This really does come down to some poor choices made early as to wording
and names.
But I'd argue that as a result of this, absolutely the expectation of
system administrators is that never means never.
So we're sort of arguing de jure vs de facto here.
>
> We would all prefer a less messy world of THP tunables. I certainly
> find plenty to dislike there too; and wish that a less assertive name
> than "never" had been chosen originally for the default off position.
>
> But please don't break the accepted and documented behaviour of
> MADV_COLLAPSE now.
Again see above, I absolutely disagree this is documented _clearly_. And
that's the underlying issue here.
I feel like if you polled 100 system administrators (assuming they knew
about THP) as to how you globally disable THP, probably all 100 would say
you do it via:
# echo never > /sys/kernel/mm/transparent_hugepage/enabled
So shouldn't 'never break userspace' be based on practical reality rather
than a theorised interpretation of documents that sadly are not clear
enough?
>
> If you want to exclude all possibility of THPs, then please use the
> prctl(PR_SET_THP_DISABLE); or shmem_enabled=deny (I think it was me
> who insisted that be respected by MADV_COLLAPSE back then).
While it's useful to have this, prctl() is where APIs go to die. It's a
hidden wasteland that nobody knows about, it may as well not exist.
We have a whole sysctl directory for configuring this stuff. It's sort of
crazy to have that then to have a special prctl() hidden away also...
>
> Add a "deny" option to /sys/kernel/mm/transparent_hugepage/enabled
> if you like. (But in these days of filesystem large folios, adding
> new protections against them seems a few years late.)
Based on a reasonable interpretation of 'never' I would say we retain this
series as-is, and 'deny' could be what 'never' was intended to be before.
>
> If Andrew decides that these patches should go in, then I'll have to
> scrutinize them more carefully than I've done so far: but currently
> I'm hoping to avoid that.
Sure and that'd be hugely appreciated!
>
> Hugh
Thanks for your feedback, it's much appreciated! I hope we can figure this
out sensibly.
Also note Baolin's point about mTHP which rather complicates matters.
I think we all feel the THP interfaces are... not perfectly ideal.
/sys/kernel/mm/transparent_hugepage_2 anyone? :P
Cheers, Lorenzo
Powered by blists - more mailing lists