lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8f0a22c2-3176-4942-994d-58d940901ecf@redhat.com>
Date: Fri, 16 May 2025 09:45:17 +0200
From: David Hildenbrand <david@...hat.com>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Cc: Usama Arif <usamaarif642@...il.com>,
 Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
 hannes@...xchg.org, shakeel.butt@...ux.dev, riel@...riel.com,
 ziy@...dia.com, laoar.shao@...il.com, baolin.wang@...ux.alibaba.com,
 Liam.Howlett@...cle.com, npache@...hat.com, ryan.roberts@....com,
 linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org, kernel-team@...a.com
Subject: Re: [PATCH 1/6] prctl: introduce PR_THP_POLICY_DEFAULT_HUGE for the
 process

On 15.05.25 22:35, Lorenzo Stoakes wrote:
> On Thu, May 15, 2025 at 09:12:13PM +0200, David Hildenbrand wrote:
>> On 15.05.25 20:08, Lorenzo Stoakes wrote:
>>> On Thu, May 15, 2025 at 06:11:55PM +0200, David Hildenbrand wrote:
>>>>>>> So if you're not overriding VM_NOHUGEPAGE, the whole point of this exercise
>>>>>>> is to override global 'never'?
>>>>>>>
>>>>>>
>>>>>> Again, I am not overriding never.
>>>>>>
>>>>>> hugepage_global_always and hugepage_global_enabled will evaluate to false
>>>>>> and you will not get a hugepage.
>>>>>
>>>>> Yeah, again ack, but I kind of hate that we set VM_HUGEPAGE everywhere even
>>>>> if the policy is never.
>>>>
>>>> I think it should behave just as if someone does manually an madvise(). So
>>>> whatever we do here during an madvise, we should try to do the same thing
>>>> here.
>>>
>>> Ack I agree with this.
>>>
>>> It actually simplifies things a LOT to view it this way - we're saying 'by
>>> default apply madvise(...) to new VMAs'.
>>>
>>> Hm I wonder if we could have a more generic version of this...
>>>
>>> Note though that we're not _quite_ doing this.
>>>
>>> So in hugepage_madvise():
>>>
>>> int hugepage_madvise(struct vm_area_struct *vma,
>>> 		     unsigned long *vm_flags, int advice)
>>> {
>>> 	...
>>>
>>> 	switch (advice) {
>>> 	case MADV_HUGEPAGE:
>>> 		*vm_flags &= ~VM_NOHUGEPAGE;
>>> 		*vm_flags |= VM_HUGEPAGE;
>>>
>>> 		...
>>>
>>> 		break;
>>>
>>> 		...
>>> 	}
>>>
>>> 	...
>>> }
>>>
>>> So here we're actually clearing VM_NOHUGEPAGE and overriding it, but in the
>>> proposed code we're not.
>>
>> Yeah, I think I suggested that, but probably we should just do exactly what
>> madvise() does.
> 
> Yes, agreed.
> 
> Usama - do you have any issue with us switching to how madvise() does it?
> 
>>
>>>
>>> So we're back into confusing territory again :)
>>>
>>> I wonder if we could...
>>>
>>> 1. Add an MADV_xxx that mimics the desired behaviour here.
>>>
>>> 2. Add a generic 'madvise() by default' thing at a process level?
>>>
>>> Is this crazy?
>>
>> I think that's what I had in mind, just a bit twisted.
>>
>> What could work is
>>
>> 1) prctl to set the default
>>
>> 2) madvise() to adjust all existing VMAs
>>
>>
>> We might have to teach 2) to ignore non-compatible VMAs / holes. Maybe not,
>> worth an investigation.
> 
> Yeah, I think it'd _probably_ be ok except on s390 (which can fail, and so
> we'd have to be able to say - skip on error, carry on).
> 
> We'll just get an -ENOMEM at the end for the gaps (god how I hate
> that). Otherwise I don't think MADV_HUGEPAGE actually is really that
> restrictive.
> 
> That would simplify :)
> 
> But I still so hate using prctl()... this might be one of those cases where
> we simply figure out we have no other choice.
 > > But when you put it as simply as this maybe it's not so bad. With the
> flags2 gone by fixing this stupid 32-bit limit it's less awful.
> 
> Perhaps worth seeing what an improved RFC of this series looks like with
> all the various bits fixed to give an idea.

Yes.

> 
> But you do then wonder if we could make this _generic_ for _any_ madvise(),
> and how _that_ would look.
> 
> But perhaps that's insane because many VMAs would simply not be suited to
> having certain madvise flags set hmm.

Same thinking. I think this is rather special.

In a perfect world not even the madvise(*HUGEPAGE) would exist.

But here we are ... 14 years (wow!) after

commit 0af4e98b6b095c74588af04872f83d333c958c32
Author: Andrea Arcangeli <aarcange@...hat.com>
Date:   Thu Jan 13 15:46:55 2011 -0800

     thp: madvise(MADV_HUGEPAGE)



(I'm surprised you don't complain about madvise(). IMHO, prctl() is even 
a better interface than catch-all madvise(); a syscall where an advise 
might not be an advise. I saw some funny rants about MADV_DONTNEED on 
reddit at some point ... :) mctrl() would have been clearer, at least 
for me :D )

-- 
Cheers,

David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ