[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <d0ebcc3d-ba81-49ca-899a-34206f8dd71f@linux.dev>
Date: Tue, 27 Jan 2026 19:47:16 +0800
From: Qi Zheng <qi.zheng@...ux.dev>
To: "David Hildenbrand (Red Hat)" <david@...nel.org>,
Andreas Larsson <andreas@...sler.com>
Cc: linux-arch@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, linux-alpha@...r.kernel.org, loongarch@...ts.linux.dev,
linux-mips@...r.kernel.org, linux-parisc@...r.kernel.org,
linux-um@...ts.infradead.org, Qi Zheng <zhengqi.arch@...edance.com>,
sparclinux <sparclinux@...r.kernel.org>, will@...nel.org,
peterz@...radead.org, akpm@...ux-foundation.org, aneesh.kumar@...nel.org,
npiggin@...il.com, dev.jain@....com, ioworker0@...il.com, linmag7@...il.com
Subject: Re: [PATCH v3 7/7] mm: make PT_RECLAIM depends on
MMU_GATHER_RCU_TABLE_FREE
On 1/27/26 7:29 PM, David Hildenbrand (Red Hat) wrote:
> On 1/26/26 07:59, Qi Zheng wrote:
>>
>>
>> On 1/23/26 11:15 PM, Andreas Larsson wrote:
>>> On 2025-12-17 10:45, Qi Zheng wrote:
>>>> From: Qi Zheng <zhengqi.arch@...edance.com>
>>>>
>>>> The PT_RECLAIM can work on all architectures that support
>>>> MMU_GATHER_RCU_TABLE_FREE, so make PT_RECLAIM depends on
>>>> MMU_GATHER_RCU_TABLE_FREE.
>>>>
>>>> BTW, change PT_RECLAIM to be enabled by default, since nobody should
>>>> want
>>>> to turn it off.
>>>>
>>>> Signed-off-by: Qi Zheng <zhengqi.arch@...edance.com>
>>>> ---
>>>> arch/x86/Kconfig | 1 -
>>>> mm/Kconfig | 9 ++-------
>>>> 2 files changed, 2 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>>>> index 80527299f859a..0d22da56a71b0 100644
>>>> --- a/arch/x86/Kconfig
>>>> +++ b/arch/x86/Kconfig
>>>> @@ -331,7 +331,6 @@ config X86
>>>> select FUNCTION_ALIGNMENT_4B
>>>> imply IMA_SECURE_AND_OR_TRUSTED_BOOT if EFI
>>>> select HAVE_DYNAMIC_FTRACE_NO_PATCHABLE
>>>> - select ARCH_SUPPORTS_PT_RECLAIM if X86_64
>>>> select ARCH_SUPPORTS_SCHED_SMT if SMP
>>>> select SCHED_SMT if SMP
>>>> select ARCH_SUPPORTS_SCHED_CLUSTER if SMP
>>>> diff --git a/mm/Kconfig b/mm/Kconfig
>>>> index bd0ea5454af82..fc00b429b7129 100644
>>>> --- a/mm/Kconfig
>>>> +++ b/mm/Kconfig
>>>> @@ -1447,14 +1447,9 @@ config ARCH_HAS_USER_SHADOW_STACK
>>>> The architecture has hardware support for userspace shadow
>>>> call
>>>> stacks (eg, x86 CET, arm64 GCS or RISC-V Zicfiss).
>>>> -config ARCH_SUPPORTS_PT_RECLAIM
>>>> - def_bool n
>>>> -
>>>> config PT_RECLAIM
>>>> - bool "reclaim empty user page table pages"
>>>> - default y
>>>> - depends on ARCH_SUPPORTS_PT_RECLAIM && MMU && SMP
>>>> - select MMU_GATHER_RCU_TABLE_FREE
>>>> + def_bool y
>>>> + depends on MMU_GATHER_RCU_TABLE_FREE
>>>> help
>>>> Try to reclaim empty user page table pages in paths other
>>>> than munmap
>>>> and exit_mmap path.
>>>
>>> Hi,
>>>
>>> This patch unfortunately results in a WARN_ON_ONCE and unaligned
>>> accesses on sparc64:
>>>
>>> $ stress-ng --mmaphuge 20 -t 60
>>> stress-ng: info: [559] setting to a 1 min run per stressor
>>> stress-ng: info: [559] dispatching hogs: 20 mmaphuge
>>> [ 560.592569] ------------[ cut here ]------------
>>> [ 560.592663] WARNING: kernel/rcu/tree.c:3098 at
>>> __call_rcu_common.constprop.0+0x200/0x760, CPU#4: stress-ng-mmaph/568
>>> [ 560.592777] CPU: 4 UID: 1000 PID: 568 Comm: stress-ng-mmaph Not
>>> tainted 6.19.0-rc5-00127-g62fc9f6ccb97 #8 VOLUNTARY
>>> [ 560.592805] Call Trace:
>>> [ 560.592812] [<00000000004368b8>] dump_stack+0x8/0x60
>>> [ 560.592844] [<0000000000482a60>] __warn+0xe0/0x140
>>> [ 560.592878] [<0000000000482b64>] warn_slowpath_fmt+0xa4/0x120
>>> [ 560.592901] [<0000000000526a40>]
>>> __call_rcu_common.constprop.0+0x200/0x760
>>> [ 560.592931] [<0000000000526fd0>] call_rcu+0x10/0x20
>>> [ 560.592954] [<0000000000730838>] tlb_remove_table+0x98/0xc0
>>> [ 560.592986] [<000000000071bec4>] free_pgd_range+0x224/0x4c0
>>> [ 560.593021] [<000000000071c35c>] free_pgtables+0x1fc/0x240
>>> [ 560.593042] [<000000000074a6f0>] vms_clear_ptes+0x110/0x140
>>> [ 560.593068] [<000000000074c3dc>] vms_complete_munmap_vmas+0x5c/0x280
>>> [ 560.593094] [<000000000074de5c>] do_vmi_align_munmap+0x1dc/0x260
>>> [ 560.593117] [<000000000074df80>] do_vmi_munmap+0xa0/0x140
>>> [ 560.593142] [<000000000074fb2c>] __vm_munmap+0x8c/0x160
>>> [ 560.593168] [<000000000072cfd4>] vm_munmap+0x14/0x40
>>> [ 560.593190] [<00000000004402a8>] sys_64_munmap+0x88/0xa0
>>> [ 560.593221] [<0000000000406274>] linux_sparc_syscall+0x34/0x44
>>> [ 560.593274] ---[ end trace 0000000000000000 ]---
>>> [ 560.593960] log_unaligned: 209 callbacks suppressed
>>> [ 560.593979] Kernel unaligned access at TPC[526a4c]
>>> __call_rcu_common.constprop.0+0x20c/0x760
>>> [ 560.594121] Kernel unaligned access at TPC[526864]
>>> __call_rcu_common.constprop.0+0x24/0x760
>>> [ 560.594198] Kernel unaligned access at TPC[52b3c4]
>>> rcu_segcblist_enqueue+0x24/0x40
>>> [ 560.594275] Kernel unaligned access at TPC[526860]
>>> __call_rcu_common.constprop.0+0x20/0x760
>>> [ 560.594360] Kernel unaligned access at TPC[526864]
>>> __call_rcu_common.constprop.0+0x24/0x760
>>> [ 567.054127] log_unaligned: 1105 callbacks suppressed
>>> [ 567.054167] Kernel unaligned access at TPC[526860]
>>> __call_rcu_common.constprop.0+0x20/0x760
>>> [ 567.054331] Kernel unaligned access at TPC[526864]
>>> __call_rcu_common.constprop.0+0x24/0x760
>>> [ 567.054410] Kernel unaligned access at TPC[52b3c4]
>>> rcu_segcblist_enqueue+0x24/0x40
>>
>> Thanks for your report!
>>
>> On sparc64, pmd and pud levels are not of struct page:
>
> Can you elaborate, I don't understand what you mean :)
On sparc64:
static inline void pgtable_free_tlb(struct mmu_gather *tlb, void *table,
bool is_page)
{
unsigned long pgf = (unsigned long)table;
if (is_page)
pgf |= 0x1UL;
tlb_remove_table(tlb, (void *)pgf);
}
static inline void __tlb_remove_table(void *_table)
{
void *table = (void *)((unsigned long)_table & ~0x1UL);
bool is_page = false;
if ((unsigned long)_table & 0x1UL)
is_page = true;
pgtable_free(table, is_page);
}
void pgtable_free(void *table, bool is_page)
{
if (is_page)
__pte_free(table);
else
kmem_cache_free(pgtable_cache, table);
}
For pmd and pud levels, is_page is false, so we can not do the
following in __tlb_remove_table_one().
```
ptdesc = table;
call_rcu(&ptdesc->pt_rcu_head, __tlb_remove_table_one_rcu);
```
>
> Is it also a problem on architectures like s390x and ppc, where we
> squeeze multiple page tables into a physical pages?
For ppc, it's the same as for sparc64.
For s390x, it supports MMU_GATHER_RCU_TABLE_FREE and define its own
pxx_free_tlb(), but these all call tlb_remove_ptdesc(), so there is no
problem.
>
>>
>> __pmd_free_tlb/__pud_free_tlb
>> --> pgtable_free_tlb(tlb, pud/pmd, false). <=== is_page == false
>> --> tlb_remove_table
>>
>> So in __tlb_remove_table_one(), the table cannot be treated as
>> ptdesc because it does not have an pt_rcu_head member.
>>
>> Hi David, it seems we still need to keep ARCH_SUPPORTS_PT_RECLAIM?
>
> Or we invert it and only disable it for the known-problematic
> architectures?
Yes, the problem lies with those architectures that support
MMU_GATHER_RCU_TABLE_FREE and define their own _tlb_remove_table().
So my plan is as follows:
1. convert __HAVE_ARCH_TLB_REMOVE_TABLE to
CONFIG_HAVE_ARCH_TLB_REMOVE_TABLE config
2. make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE &&
!HAVE_ARCH_TLB_REMOVE_TABLE
I'll send v4 soon.
Thanks,
Qi
>
Powered by blists - more mailing lists