lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cbc95f9b-1c13-45ec-8d34-38544d3f2dd3@lucifer.local>
Date: Thu, 15 May 2025 21:35:32 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: David Hildenbrand <david@...hat.com>
Cc: Usama Arif <usamaarif642@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
        hannes@...xchg.org, shakeel.butt@...ux.dev, riel@...riel.com,
        ziy@...dia.com, laoar.shao@...il.com, baolin.wang@...ux.alibaba.com,
        Liam.Howlett@...cle.com, npache@...hat.com, ryan.roberts@....com,
        linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org,
        kernel-team@...a.com
Subject: Re: [PATCH 1/6] prctl: introduce PR_THP_POLICY_DEFAULT_HUGE for the
 process

On Thu, May 15, 2025 at 09:12:13PM +0200, David Hildenbrand wrote:
> On 15.05.25 20:08, Lorenzo Stoakes wrote:
> > On Thu, May 15, 2025 at 06:11:55PM +0200, David Hildenbrand wrote:
> > > > > > So if you're not overriding VM_NOHUGEPAGE, the whole point of this exercise
> > > > > > is to override global 'never'?
> > > > > >
> > > > >
> > > > > Again, I am not overriding never.
> > > > >
> > > > > hugepage_global_always and hugepage_global_enabled will evaluate to false
> > > > > and you will not get a hugepage.
> > > >
> > > > Yeah, again ack, but I kind of hate that we set VM_HUGEPAGE everywhere even
> > > > if the policy is never.
> > >
> > > I think it should behave just as if someone does manually an madvise(). So
> > > whatever we do here during an madvise, we should try to do the same thing
> > > here.
> >
> > Ack I agree with this.
> >
> > It actually simplifies things a LOT to view it this way - we're saying 'by
> > default apply madvise(...) to new VMAs'.
> >
> > Hm I wonder if we could have a more generic version of this...
> >
> > Note though that we're not _quite_ doing this.
> >
> > So in hugepage_madvise():
> >
> > int hugepage_madvise(struct vm_area_struct *vma,
> > 		     unsigned long *vm_flags, int advice)
> > {
> > 	...
> >
> > 	switch (advice) {
> > 	case MADV_HUGEPAGE:
> > 		*vm_flags &= ~VM_NOHUGEPAGE;
> > 		*vm_flags |= VM_HUGEPAGE;
> >
> > 		...
> >
> > 		break;
> >
> > 		...
> > 	}
> >
> > 	...
> > }
> >
> > So here we're actually clearing VM_NOHUGEPAGE and overriding it, but in the
> > proposed code we're not.
>
> Yeah, I think I suggested that, but probably we should just do exactly what
> madvise() does.

Yes, agreed.

Usama - do you have any issue with us switching to how madvise() does it?

>
> >
> > So we're back into confusing territory again :)
> >
> > I wonder if we could...
> >
> > 1. Add an MADV_xxx that mimics the desired behaviour here.
> >
> > 2. Add a generic 'madvise() by default' thing at a process level?
> >
> > Is this crazy?
>
> I think that's what I had in mind, just a bit twisted.
>
> What could work is
>
> 1) prctl to set the default
>
> 2) madvise() to adjust all existing VMAs
>
>
> We might have to teach 2) to ignore non-compatible VMAs / holes. Maybe not,
> worth an investigation.

Yeah, I think it'd _probably_ be ok except on s390 (which can fail, and so
we'd have to be able to say - skip on error, carry on).

We'll just get an -ENOMEM at the end for the gaps (god how I hate
that). Otherwise I don't think MADV_HUGEPAGE actually is really that
restrictive.

That would simplify :)

But I still so hate using prctl()... this might be one of those cases where
we simply figure out we have no other choice.

But when you put it as simply as this maybe it's not so bad. With the
flags2 gone by fixing this stupid 32-bit limit it's less awful.

Perhaps worth seeing what an improved RFC of this series looks like with
all the various bits fixed to give an idea.

But you do then wonder if we could make this _generic_ for _any_ madvise(),
and how _that_ would look.

But perhaps that's insane because many VMAs would simply not be suited to
having certain madvise flags set hmm.

Maybe let me have a think about an improved madvise() interface along these
lines anyway in general... interesting thought experiment :)

>
>
> --
> Cheers,
>
> David / dhildenb
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ