[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <479b0409-335f-4450-8eb2-5270a5847f5e@linux.dev>
Date: Wed, 19 Nov 2025 20:13:10 +0800
From: Qi Zheng <qi.zheng@...ux.dev>
To: "David Hildenbrand (Red Hat)" <david@...nel.org>, will@...nel.org,
aneesh.kumar@...nel.org, npiggin@...il.com, peterz@...radead.org,
dev.jain@....com, akpm@...ux-foundation.org, ioworker0@...il.com
Cc: linux-arch@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, linux-alpha@...r.kernel.org,
linux-snps-arc@...ts.infradead.org, loongarch@...ts.linux.dev,
linux-mips@...r.kernel.org, linux-parisc@...r.kernel.org,
linux-um@...ts.infradead.org, Qi Zheng <zhengqi.arch@...edance.com>
Subject: Re: [PATCH 7/7] mm: make PT_RECLAIM depend on
MMU_GATHER_RCU_TABLE_FREE && 64BIT
On 11/19/25 7:35 PM, David Hildenbrand (Red Hat) wrote:
> On 19.11.25 12:02, Qi Zheng wrote:
>> Hi David,
>>
>> On 11/19/25 6:19 PM, David Hildenbrand (Red Hat) wrote:
>>> On 18.11.25 13:02, Qi Zheng wrote:
>>>>
>>>>
>>>> On 11/18/25 12:57 AM, David Hildenbrand (Red Hat) wrote:
>>>>> On 14.11.25 12:11, Qi Zheng wrote:
>>>>>> From: Qi Zheng <zhengqi.arch@...edance.com>
>>>>>
>>>>> Subject: s/&&/&/
>>>>
>>>> will do.
>>>>
>>>>>
>>>>>>
>>>>>> Make PT_RECLAIM depend on MMU_GATHER_RCU_TABLE_FREE so that
>>>>>> PT_RECLAIM
>>>>>> can
>>>>>> be enabled by default on all architectures that support
>>>>>> MMU_GATHER_RCU_TABLE_FREE.
>>>>>>
>>>>>> Considering that a large number of PTE page table pages (such as
>>>>>> 100GB+)
>>>>>> can only be caused on a 64-bit system, let PT_RECLAIM also depend on
>>>>>> 64BIT.
>>>>>>
>>>>>> Signed-off-by: Qi Zheng <zhengqi.arch@...edance.com>
>>>>>> ---
>>>>>> arch/x86/Kconfig | 1 -
>>>>>> mm/Kconfig | 6 +-----
>>>>>> 2 files changed, 1 insertion(+), 6 deletions(-)
>>>>>>
>>>>>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>>>>>> index eac2e86056902..96bff81fd4787 100644
>>>>>> --- a/arch/x86/Kconfig
>>>>>> +++ b/arch/x86/Kconfig
>>>>>> @@ -330,7 +330,6 @@ config X86
>>>>>> select FUNCTION_ALIGNMENT_4B
>>>>>> imply IMA_SECURE_AND_OR_TRUSTED_BOOT if EFI
>>>>>> select HAVE_DYNAMIC_FTRACE_NO_PATCHABLE
>>>>>> - select ARCH_SUPPORTS_PT_RECLAIM if X86_64
>>>>>> select ARCH_SUPPORTS_SCHED_SMT if SMP
>>>>>> select SCHED_SMT if SMP
>>>>>> select ARCH_SUPPORTS_SCHED_CLUSTER if SMP
>>>>>> diff --git a/mm/Kconfig b/mm/Kconfig
>>>>>> index a5a90b169435d..e795fbd69e50c 100644
>>>>>> --- a/mm/Kconfig
>>>>>> +++ b/mm/Kconfig
>>>>>> @@ -1440,14 +1440,10 @@ config ARCH_HAS_USER_SHADOW_STACK
>>>>>> The architecture has hardware support for userspace shadow
>>>>>> call
>>>>>> stacks (eg, x86 CET, arm64 GCS or RISC-V Zicfiss).
>>>>>> -config ARCH_SUPPORTS_PT_RECLAIM
>>>>>> - def_bool n
>>>>>> -
>>>>>> config PT_RECLAIM
>>>>>> bool "reclaim empty user page table pages"
>>>>>> default y
>>>>>> - depends on ARCH_SUPPORTS_PT_RECLAIM && MMU && SMP
>>>>>> - select MMU_GATHER_RCU_TABLE_FREE
>>>>>> + depends on MMU_GATHER_RCU_TABLE_FREE && MMU && SMP && 64BIT
>>>>>
>>>>> Who would we have MMU_GATHER_RCU_TABLE_FREE without MMU? (can we drop
>>>>> the MMU part)
>>>>
>>>> OK.
>>>>
>>>>>
>>>>> Why do we care about SMP in the first place? (can we frop SMP)
>>>>
>>>> OK.
>>>>
>>>>>
>>>>> But I also wonder why we need "MMU_GATHER_RCU_TABLE_FREE && 64BIT":
>>>>>
>>>>> Would it be harmful on 32bit (sure, we might not reclaim as much, but
>>>>> still there is memory to be reclaimed?)?
>>>>
>>>> This is also fine on 32bit, but the benefits are not significant, So I
>>>> chose to enable it only on 64-bit.
>>>
>>> Right. Address space is smaller, but also memory is smaller. Not that I
>>> think we strictly *must* to support 32bit, I merely wonder why we
>>> wouldn't just enable it here.
>>>
>>> OTOH, if there is a good reason we cannot enable it, we can definitely
>>> just keep it 64bit only.
>>
>> The only difficulty is this:
>>
>>>
>>>>
>>>> I actually tried enabling MMU_GATHER_RCU_TABLE_FREE on all
>>>> architectures, and apart from sparc32 being a bit troublesome (because
>>>> it uses mm->page_table_lock for synchronization within
>>>> __pte_free_tlb()), the modifications were relatively simple.
>>
>> in sparc32:
>>
>> void pte_free(struct mm_struct *mm, pgtable_t ptep)
>> {
>> struct page *page;
>>
>> page = pfn_to_page(__nocache_pa((unsigned long)ptep) >>
>> PAGE_SHIFT);
>> spin_lock(&mm->page_table_lock);
>> if (page_ref_dec_return(page) == 1)
>> pagetable_dtor(page_ptdesc(page));
>> spin_unlock(&mm->page_table_lock);
>>
>> srmmu_free_nocache(ptep, SRMMU_PTE_TABLE_SIZE);
>> }
>>
>> #define __pte_free_tlb(tlb, pte, addr) pte_free((tlb)->mm, pte)
>>
>> To enable MMU_GATHER_RCU_TABLE_FREE on sparc32, we need to implement
>> __tlb_remove_table(), and call the pte_free() above in
>> __tlb_remove_table().
>>
>> However, the __tlb_remove_table() does not have an mm parameter:
>>
>> void __tlb_remove_table(void *_table)
>>
>> so we need to use another lock instead of mm->page_table_lock.
>>
>> I have already sent the v2 [1], and perhaps after that I can enable
>> PT_RECLAIM on all 32-bit architectures as well.
>>
>
> I guess if we just make it depend on MMU_GATHER_RCU_TABLE_FREE that will
> be fine.
>
>> [1].
>> https://lore.kernel.org/all/
>> cover.1763537007.git.zhengqi.arch@...edance.com/
>>
>>>>
>>>>>
>>>>> If all 64BIT support MMU_GATHER_RCU_TABLE_FREE (as you previously
>>>>> state), why can't we only check for 64BIT?
>>>>
>>>> OK, will do.
>>>
>>> This was also more of a question for discussion:
>>>
>>> Would it make sense to have
>>>
>>> config PT_RECLAIM
>>> def_bool y
>>> depends on MMU_GATHER_RCU_TABLE_FREE
>>
>> make sense.
>>
>>>
>>> (a) Would we want to make it configurable (why?)
>>
>> No, it was just out of caution before.
>>
>>> (b) Do we really care about SMP (why?)
>>
>> No. Simply because the following situation is impossible to occur:
>>
>> pte_offset_map
>> traversing the PTE page table
>>
>> <preemption or hardirq>
>>
>> call madvise(MADV_DONTNEED)
>>
>> so there's no need to free PTE page via RCU.
>>
>>> (c) Do we want to limit to 64bit (why?)
>>
>> No, just because the profit is greater at 64-BIT.
>
> I was briefly wondering if on 32bit (but maybe also on 64bit with
> configurable user page table levels?) we could have the scenario that we
> only have two page table levels.
>
> So reclaiming the PMD level (corresponding to the highest level) would
reclaiming the PMD level? The PT_RECLAIM only reclaim PTE pages, not PMD
pages, am I misunderstanding something?
> be impossible. But for that to happen one would have to discard the
> whole address range through MADV_DONTNEED (impossible I guess) :)
>
Powered by blists - more mailing lists