lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <decbae07-0c84-4379-9f9d-6e2bd6dbad6e@lucifer.local>
Date: Wed, 25 Jun 2025 09:37:41 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: David Hildenbrand <david@...hat.com>
Cc: Hugh Dickins <hughd@...gle.com>,
        Baolin Wang <baolin.wang@...ux.alibaba.com>, akpm@...ux-foundation.org,
        ziy@...dia.com, Liam.Howlett@...cle.com, npache@...hat.com,
        ryan.roberts@....com, dev.jain@....com, baohua@...nel.org,
        zokeefe@...gle.com, shy828301@...il.com, usamaarif642@...il.com,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 0/2] fix MADV_COLLAPSE issue if THP settings are
 disabled

On Wed, Jun 25, 2025 at 10:24:53AM +0200, David Hildenbrand wrote:
> On 25.06.25 10:12, Lorenzo Stoakes wrote:
> > On Wed, Jun 25, 2025 at 08:55:28AM +0100, Lorenzo Stoakes wrote:
> > > I suppose the least awful way of addressing Baolin's concerns re: mTHP
> > > while simultaneosly keeping existing semantics is:
> > >
> > > 1. Introduce deny to mean what never should have meant.
> >
> > To fix Baolin's issue btw we'd have to add 'deny' to both 'global' settings
> > _and_ each page size setting.
> >
> > Because otherwise we'd end up in a weird case where say:
> >
> > global 'deny'
> >
> >   2 MiB 'never'
> > 64 KiB 'inherit'
> >
> > And err... get 2 MiB THP pages from MADV_COLLAPSE :)
> >
> > Or:
> >
> > global 'deny'
> >
> >   2 MiB 'never'
> > 64 KiB 'always'
> >
> > Or:
> >
> > global 'never'
> >
> >   2 MiB 'never'
> > 64 KiB 'always'
> >
> > Or:
> >
> > global 'never'
> >
> >   2 MiB 'madvise'
> > 64 KiB 'always'
> >
> > All doing the same. Not very clear is it?
> >
> > We have sowed the seeds of something terrible here, truly.
>
> Fully agreed. "Deny" is nasty. Maybe if we really need a way to disable
> "madv_collapse", it should be done differently, not using this toggle here.

Yeah maybe the best way is to just have another tunable for this?

/sys/kernel/mm/transparent_hugepage/disable_collapse perhaps?

What do you think Hugh, Baolin?

>
> Regarding MADV_COLLAPSE, I strongly assume that we should not change it to
> collapse smaller mTHPs as part of the khugepaged mTHP work. For now, it will
> simply always collapse to PMD THPs.

Yeah thinking about it maybe this is the best way. And we can then update
the man page to make this ABUNDANTLY clear (am happy to do this).

This keeps things simple.

(One side note on PMD-sized MADV_COLLAPSE - this is basically completely
useless for 64 KB page size arm64 systems where PMD's are 512 MB :)

Thoughts Baolin?

>
> Once we want to support other sizes, likely MADV_COLLAPSE users want to have
> better control over which size to use, at which point it all gets nasty.

madvise2() this time with extra parameters? ;)

I sort of wish we had added a flags parameter there.

But lacking a time machine... :)

>
> --
> Cheers,
>
> David / dhildenb
>

To summarise:

Drop series:

* Might degrade performance for very specific users using
  never/MADV_COLLAPSE (quite possibly via process_madvise() + a remote
  process).

* Matches 'de jure' interpretation of documentation.

Keep series:

* Provides no means whatsoever to have a 'manual only' collapse mode,
  though does provide for manual khugepaged THP.

* MADV_COLLAPSE automatically gets mTHP support based on obeying 'never'.

* Matches likely 'de facto' understanding system admins have about THP
  usage.

Action items:

* Either way, I (Lorenzo) will improve documentation.

* If we drop the series, provide another means to disable
  MADV_COLLAPSE. But not using existing sysfs toggles, something new. We
  will document MADV_COLLAPSE as PMD only.

* If we drop the series, also consider how we might provide mTHP-compatible
  MADV_COLLAPSE.

* Totally and completely refactor the hell out of the THP implementation
  from top-to-bottom (over time this is becoming more and more of a me
  thing... as I'm getting ever more frustrated with the implementation ;)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ