[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <7160b6ec-4da5-4273-be91-1339bd00d009@kernel.org>
Date: Wed, 19 Nov 2025 13:24:35 +0100
From: "David Hildenbrand (Red Hat)" <david@...nel.org>
To: Qi Zheng <qi.zheng@...ux.dev>, will@...nel.org, aneesh.kumar@...nel.org,
npiggin@...il.com, peterz@...radead.org, dev.jain@....com,
akpm@...ux-foundation.org, ioworker0@...il.com
Cc: linux-arch@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, linux-alpha@...r.kernel.org,
linux-snps-arc@...ts.infradead.org, loongarch@...ts.linux.dev,
linux-mips@...r.kernel.org, linux-parisc@...r.kernel.org,
linux-um@...ts.infradead.org, Qi Zheng <zhengqi.arch@...edance.com>
Subject: Re: [PATCH 7/7] mm: make PT_RECLAIM depend on
MMU_GATHER_RCU_TABLE_FREE && 64BIT
On 19.11.25 13:13, Qi Zheng wrote:
>
>
> On 11/19/25 7:35 PM, David Hildenbrand (Red Hat) wrote:
>> On 19.11.25 12:02, Qi Zheng wrote:
>>> Hi David,
>>>
>>> On 11/19/25 6:19 PM, David Hildenbrand (Red Hat) wrote:
>>>> On 18.11.25 13:02, Qi Zheng wrote:
>>>>>
>>>>>
>>>>> On 11/18/25 12:57 AM, David Hildenbrand (Red Hat) wrote:
>>>>>> On 14.11.25 12:11, Qi Zheng wrote:
>>>>>>> From: Qi Zheng <zhengqi.arch@...edance.com>
>>>>>>
>>>>>> Subject: s/&&/&/
>>>>>
>>>>> will do.
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Make PT_RECLAIM depend on MMU_GATHER_RCU_TABLE_FREE so that
>>>>>>> PT_RECLAIM
>>>>>>> can
>>>>>>> be enabled by default on all architectures that support
>>>>>>> MMU_GATHER_RCU_TABLE_FREE.
>>>>>>>
>>>>>>> Considering that a large number of PTE page table pages (such as
>>>>>>> 100GB+)
>>>>>>> can only be caused on a 64-bit system, let PT_RECLAIM also depend on
>>>>>>> 64BIT.
>>>>>>>
>>>>>>> Signed-off-by: Qi Zheng <zhengqi.arch@...edance.com>
>>>>>>> ---
>>>>>>> arch/x86/Kconfig | 1 -
>>>>>>> mm/Kconfig | 6 +-----
>>>>>>> 2 files changed, 1 insertion(+), 6 deletions(-)
>>>>>>>
>>>>>>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>>>>>>> index eac2e86056902..96bff81fd4787 100644
>>>>>>> --- a/arch/x86/Kconfig
>>>>>>> +++ b/arch/x86/Kconfig
>>>>>>> @@ -330,7 +330,6 @@ config X86
>>>>>>> select FUNCTION_ALIGNMENT_4B
>>>>>>> imply IMA_SECURE_AND_OR_TRUSTED_BOOT if EFI
>>>>>>> select HAVE_DYNAMIC_FTRACE_NO_PATCHABLE
>>>>>>> - select ARCH_SUPPORTS_PT_RECLAIM if X86_64
>>>>>>> select ARCH_SUPPORTS_SCHED_SMT if SMP
>>>>>>> select SCHED_SMT if SMP
>>>>>>> select ARCH_SUPPORTS_SCHED_CLUSTER if SMP
>>>>>>> diff --git a/mm/Kconfig b/mm/Kconfig
>>>>>>> index a5a90b169435d..e795fbd69e50c 100644
>>>>>>> --- a/mm/Kconfig
>>>>>>> +++ b/mm/Kconfig
>>>>>>> @@ -1440,14 +1440,10 @@ config ARCH_HAS_USER_SHADOW_STACK
>>>>>>> The architecture has hardware support for userspace shadow
>>>>>>> call
>>>>>>> stacks (eg, x86 CET, arm64 GCS or RISC-V Zicfiss).
>>>>>>> -config ARCH_SUPPORTS_PT_RECLAIM
>>>>>>> - def_bool n
>>>>>>> -
>>>>>>> config PT_RECLAIM
>>>>>>> bool "reclaim empty user page table pages"
>>>>>>> default y
>>>>>>> - depends on ARCH_SUPPORTS_PT_RECLAIM && MMU && SMP
>>>>>>> - select MMU_GATHER_RCU_TABLE_FREE
>>>>>>> + depends on MMU_GATHER_RCU_TABLE_FREE && MMU && SMP && 64BIT
>>>>>>
>>>>>> Who would we have MMU_GATHER_RCU_TABLE_FREE without MMU? (can we drop
>>>>>> the MMU part)
>>>>>
>>>>> OK.
>>>>>
>>>>>>
>>>>>> Why do we care about SMP in the first place? (can we frop SMP)
>>>>>
>>>>> OK.
>>>>>
>>>>>>
>>>>>> But I also wonder why we need "MMU_GATHER_RCU_TABLE_FREE && 64BIT":
>>>>>>
>>>>>> Would it be harmful on 32bit (sure, we might not reclaim as much, but
>>>>>> still there is memory to be reclaimed?)?
>>>>>
>>>>> This is also fine on 32bit, but the benefits are not significant, So I
>>>>> chose to enable it only on 64-bit.
>>>>
>>>> Right. Address space is smaller, but also memory is smaller. Not that I
>>>> think we strictly *must* to support 32bit, I merely wonder why we
>>>> wouldn't just enable it here.
>>>>
>>>> OTOH, if there is a good reason we cannot enable it, we can definitely
>>>> just keep it 64bit only.
>>>
>>> The only difficulty is this:
>>>
>>>>
>>>>>
>>>>> I actually tried enabling MMU_GATHER_RCU_TABLE_FREE on all
>>>>> architectures, and apart from sparc32 being a bit troublesome (because
>>>>> it uses mm->page_table_lock for synchronization within
>>>>> __pte_free_tlb()), the modifications were relatively simple.
>>>
>>> in sparc32:
>>>
>>> void pte_free(struct mm_struct *mm, pgtable_t ptep)
>>> {
>>> struct page *page;
>>>
>>> page = pfn_to_page(__nocache_pa((unsigned long)ptep) >>
>>> PAGE_SHIFT);
>>> spin_lock(&mm->page_table_lock);
>>> if (page_ref_dec_return(page) == 1)
>>> pagetable_dtor(page_ptdesc(page));
>>> spin_unlock(&mm->page_table_lock);
>>>
>>> srmmu_free_nocache(ptep, SRMMU_PTE_TABLE_SIZE);
>>> }
>>>
>>> #define __pte_free_tlb(tlb, pte, addr) pte_free((tlb)->mm, pte)
>>>
>>> To enable MMU_GATHER_RCU_TABLE_FREE on sparc32, we need to implement
>>> __tlb_remove_table(), and call the pte_free() above in
>>> __tlb_remove_table().
>>>
>>> However, the __tlb_remove_table() does not have an mm parameter:
>>>
>>> void __tlb_remove_table(void *_table)
>>>
>>> so we need to use another lock instead of mm->page_table_lock.
>>>
>>> I have already sent the v2 [1], and perhaps after that I can enable
>>> PT_RECLAIM on all 32-bit architectures as well.
>>>
>>
>> I guess if we just make it depend on MMU_GATHER_RCU_TABLE_FREE that will
>> be fine.
>>
>>> [1].
>>> https://lore.kernel.org/all/
>>> cover.1763537007.git.zhengqi.arch@...edance.com/
>>>
>>>>>
>>>>>>
>>>>>> If all 64BIT support MMU_GATHER_RCU_TABLE_FREE (as you previously
>>>>>> state), why can't we only check for 64BIT?
>>>>>
>>>>> OK, will do.
>>>>
>>>> This was also more of a question for discussion:
>>>>
>>>> Would it make sense to have
>>>>
>>>> config PT_RECLAIM
>>>> def_bool y
>>>> depends on MMU_GATHER_RCU_TABLE_FREE
>>>
>>> make sense.
>>>
>>>>
>>>> (a) Would we want to make it configurable (why?)
>>>
>>> No, it was just out of caution before.
>>>
>>>> (b) Do we really care about SMP (why?)
>>>
>>> No. Simply because the following situation is impossible to occur:
>>>
>>> pte_offset_map
>>> traversing the PTE page table
>>>
>>> <preemption or hardirq>
>>>
>>> call madvise(MADV_DONTNEED)
>>>
>>> so there's no need to free PTE page via RCU.
>>>
>>>> (c) Do we want to limit to 64bit (why?)
>>>
>>> No, just because the profit is greater at 64-BIT.
>>
>> I was briefly wondering if on 32bit (but maybe also on 64bit with
>> configurable user page table levels?) we could have the scenario that we
>> only have two page table levels.
>>
>> So reclaiming the PMD level (corresponding to the highest level) would
>
> reclaiming the PMD level? The PT_RECLAIM only reclaim PTE pages, not PMD
> pages, am I misunderstanding something?
Sorry, I looked too much into PMD table sharing the last days :D
You're right, it would work in any case even with only 2 levels of apge
tables.
--
Cheers
David
Powered by blists - more mailing lists