[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <09acd558-19b9-4964-823b-502b9044f954@redhat.com>
Date: Thu, 31 Jul 2025 18:15:35 +0200
From: David Hildenbrand <david@...hat.com>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
Usama Arif <usamaarif642@...il.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
linux-fsdevel@...r.kernel.org, corbet@....net, rppt@...nel.org,
surenb@...gle.com, mhocko@...e.com, hannes@...xchg.org, baohua@...nel.org,
shakeel.butt@...ux.dev, riel@...riel.com, ziy@...dia.com,
laoar.shao@...il.com, dev.jain@....com, baolin.wang@...ux.alibaba.com,
npache@...hat.com, Liam.Howlett@...cle.com, ryan.roberts@....com,
vbabka@...e.cz, jannh@...gle.com, Arnd Bergmann <arnd@...db.de>,
sj@...nel.org, linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org,
kernel-team@...a.com
Subject: Re: [PATCH v2 2/5] mm/huge_memory: convert "tva_flags" to "enum
tva_type" for thp_vma_allowable_order*()
On 31.07.25 16:00, Lorenzo Stoakes wrote:
> On Thu, Jul 31, 2025 at 01:27:19PM +0100, Usama Arif wrote:
>> From: David Hildenbrand <david@...hat.com>
>>
>> Describing the context through a type is much clearer, and good enough
>> for our case.
Just for the other patch, I'll let Usama take it from here, just a bunch
of comments.
>
> This is pretty bare bones. What context, what type? Under what
> circumstances?
>
> This also is missing detail on the key difference here - that actually it
> turns out we _don't_ need these to be flags, rather we can have _distinct_
> modes which are clearer.
>
> I'd say something like:
>
> when determining which THP orders are eligiible for a VMA mapping,
> we have previously specified tva_flags, however it turns out it is
> really not necessary to treat these as flags.
>
> Rather, we distinguish between distinct modes.
>
> The only case where we previously combined flags was with
> TVA_ENFORCE_SYSFS, but we can avoid this by observing that this is
> the default, except for MADV_COLLAPSE or an edge cases in
> collapse_pte_mapped_thp() and hugepage_vma_revalidate(), and adding
> a mode specifically for this case - TVA_FORCED_COLLAPSE.
>
> ... stuff about the different modes...
>
>>
>> We have:
>> * smaps handling for showing "THPeligible"
>> * Pagefault handling
>> * khugepaged handling
>> * Forced collapse handling: primarily MADV_COLLAPSE, but one other odd case
>
> Can we actually state what this case is? I mean I guess a handwave in the
> form of 'an edge case in collapse_pte_mapped_thp()' will do also.
Yeah, something like that. I think we also call it when we previously
checked that there is a THP and that we might be allowed to collapse.
E.g., collapse_pte_mapped_thp() is also called from khugepaged code
where we already checked the allowed order.
>
> Hmm actually we do weird stuff with this so maybe just handwave.
>
> Like uprobes calls collapse_pte_mapped_thp()... :/ I'm not sure this 'If we
> are here, we've succeeded in replacing all the native pages in the page
> cache with a single hugepage.' comment is even correct.
I think in all these cases we already have a THP and want to force that
collapse in the page table.
[...]
>>
>> Really, we want to ignore sysfs only when we are forcing a collapse
>> through MADV_COLLAPSE, otherwise we want to enforce.
>
> I'd say 'ignoring this edge case, ...'
>
> I think the clearest thing might be to literally list the before/after
> like:
>
> * TVA_SMAPS | TVA_ENFORCE_SYSFS -> TVA_SMAPS
> * TVA_IN_PF | TVA_ENFORCE_SYSFS -> TVA_PAGEFAULT
> * TVA_ENFORCE_SYSFS -> TVA_KHUGEPAGED
> * 0 -> TVA_FORCED_COLLAPSE
>
That makes sense.
>>
>> With this change, we immediately know if we are in the forced collapse
>> case, which will be valuable next.
>>
>> Signed-off-by: David Hildenbrand <david@...hat.com>
>> Acked-by: Usama Arif <usamaarif642@...il.com>
>> Signed-off-by: Usama Arif <usamaarif642@...il.com>
>
> Overall this is a great cleanup, some various nits however.
>
>> ---
>> fs/proc/task_mmu.c | 4 ++--
>> include/linux/huge_mm.h | 30 ++++++++++++++++++------------
>> mm/huge_memory.c | 8 ++++----
>> mm/khugepaged.c | 18 +++++++++---------
>> mm/memory.c | 14 ++++++--------
>> 5 files changed, 39 insertions(+), 35 deletions(-)
>>
>> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
>> index 3d6d8a9f13fc..d440df7b3d59 100644
>> --- a/fs/proc/task_mmu.c
>> +++ b/fs/proc/task_mmu.c
>> @@ -1293,8 +1293,8 @@ static int show_smap(struct seq_file *m, void *v)
>> __show_smap(m, &mss, false);
>>
>> seq_printf(m, "THPeligible: %8u\n",
>> - !!thp_vma_allowable_orders(vma, vma->vm_flags,
>> - TVA_SMAPS | TVA_ENFORCE_SYSFS, THP_ORDERS_ALL));
>> + !!thp_vma_allowable_orders(vma, vma->vm_flags, TVA_SMAPS,
>> + THP_ORDERS_ALL));
>
> This !! is so gross, wonder if we could have a bool wrapper. But not a big
> deal.
>
> I also sort of _hate_ the smaps flag anyway, invoking this 'allowable
> orders' thing just for smaps reporting with maybe some minor delta is just
> odd.
>
> Something like `bool vma_has_thp_allowed_orders(struct vm_area_struct
> *vma);` would be nicer.
>
> Anyway thoughts for another time... :)
Yeah, that's not the only nasty bit here ... :)
>
>>
>> if (arch_pkeys_enabled())
>> seq_printf(m, "ProtectionKey: %8u\n", vma_pkey(vma));
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index 71db243a002e..b0ff54eee81c 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -94,12 +94,15 @@ extern struct kobj_attribute thpsize_shmem_enabled_attr;
>> #define THP_ORDERS_ALL \
>> (THP_ORDERS_ALL_ANON | THP_ORDERS_ALL_SPECIAL | THP_ORDERS_ALL_FILE_DEFAULT)
>>
>> -#define TVA_SMAPS (1 << 0) /* Will be used for procfs */
>
> Dumb question, but what does 'TVA' stand for? :P
Whoever came up with that probably used the function name where this is
passed in
thp_vma_allowable_orders()
>
>> -#define TVA_IN_PF (1 << 1) /* Page fault handler */
>> -#define TVA_ENFORCE_SYSFS (1 << 2) /* Obey sysfs configuration */
>> +enum tva_type {
>> + TVA_SMAPS, /* Exposing "THPeligible:" in smaps. */
>
> How I hate this flag (just an observation...)
>
>> + TVA_PAGEFAULT, /* Serving a page fault. */
>> + TVA_KHUGEPAGED, /* Khugepaged collapse. */
>
> This is equivalent to the TVA_ENFORCE_SYSFS case before, sort of a default
> I guess, but actually quite nice to add the context that it's sourced from
> khugepaged - I assume this will always be the case when specified?
>
>> + TVA_FORCED_COLLAPSE, /* Forced collapse (i.e., MADV_COLLAPSE). */
>
> Would put 'e.g.' here, then that allows 'space' for the edge case...
Makes sense.
>
>> +};
>>
>> -#define thp_vma_allowable_order(vma, vm_flags, tva_flags, order) \
>> - (!!thp_vma_allowable_orders(vma, vm_flags, tva_flags, BIT(order)))
>> +#define thp_vma_allowable_order(vma, vm_flags, type, order) \
>> + (!!thp_vma_allowable_orders(vma, vm_flags, type, BIT(order)))
>
> Nit, but maybe worth keeping tva_ prefix - tva_type - here just so it's
> clear what type it refers to.
>
> But not end of the world.
>
> Same comment goes for param names below etc.
No strong opinion, but I prefer to drop the prefix when it can be
deduced from the type and we are inside of the very function that
essentially defines these types (tva prefix is implicit, no other type
applies).
These should probably just be inline functions at some point with proper
types and doc (separate patch uin the future, of course).
[...]
>> +++ b/mm/khugepaged.c
>> @@ -474,8 +474,7 @@ void khugepaged_enter_vma(struct vm_area_struct *vma,
>> {
>> if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
>> hugepage_pmd_enabled()) {
>> - if (thp_vma_allowable_order(vma, vm_flags, TVA_ENFORCE_SYSFS,
>> - PMD_ORDER))
>> + if (thp_vma_allowable_order(vma, vm_flags, TVA_KHUGEPAGED, PMD_ORDER))
>> __khugepaged_enter(vma->vm_mm);
>> }
>> }
>> @@ -921,7 +920,8 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
>> struct collapse_control *cc)
>> {
>> struct vm_area_struct *vma;
>> - unsigned long tva_flags = cc->is_khugepaged ? TVA_ENFORCE_SYSFS : 0;
>> + enum tva_type tva_type = cc->is_khugepaged ? TVA_KHUGEPAGED :
>> + TVA_FORCED_COLLAPSE;
>
> This is great, this is so much clearer.
>
> A nit though, I mean I come back to my 'type' vs 'tva_type' nit above, this
> is inconsistent, so we should choose one approach and stick with it.
This is outside of the function, so I would prefer to keep it here, but
no stong opinion.
>
>>
>> if (unlikely(hpage_collapse_test_exit_or_disable(mm)))
>> return SCAN_ANY_PROCESS;
>> @@ -932,7 +932,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
>>
>> if (!thp_vma_suitable_order(vma, address, PMD_ORDER))
>> return SCAN_ADDRESS_RANGE;
>> - if (!thp_vma_allowable_order(vma, vma->vm_flags, tva_flags, PMD_ORDER))
>> + if (!thp_vma_allowable_order(vma, vma->vm_flags, tva_type, PMD_ORDER))
>> return SCAN_VMA_CHECK;
>> /*
>> * Anon VMA expected, the address may be unmapped then
>> @@ -1532,9 +1532,10 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr,
>> * in the page cache with a single hugepage. If a mm were to fault-in
>> * this memory (mapped by a suitably aligned VMA), we'd get the hugepage
>> * and map it by a PMD, regardless of sysfs THP settings. As such, let's
>> - * analogously elide sysfs THP settings here.
>> + * analogously elide sysfs THP settings here and pretend we are
>> + * collapsing.
>
> I think saying pretending here is potentially confusing, maybe worth saying
> 'force collapse'?
Makes sense.
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists