[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b8fe05f1-23c2-423f-88cc-310996c6aad5@loongson.cn>
Date: Fri, 25 Apr 2025 10:47:24 +0800
From: Ming Wang <wangming01@...ngson.cn>
To: Peter Xu <peterx@...hat.com>
Cc: Huacai Chen <chenhuacai@...nel.org>, WANG Xuerui <kernel@...0n.name>,
Andrew Morton <akpm@...ux-foundation.org>,
Hongchen Zhang <zhanghongchen@...ngson.cn>, loongarch@...ts.linux.dev,
linux-kernel@...r.kernel.org, lixuefeng@...ngson.cn, gaojuxin@...ngson.cn,
chenhuacai@...ngson.cn
Subject: Re: [PATCH] mm/hugetlb: LoongArch: Return NULL from huge_pte_offset()
for none PMD
Hi Peter Xu,
Thanks for taking a look and raising these important points!
On 4/25/25 02:48, Peter Xu wrote:
> On Thu, Apr 24, 2025 at 04:30:37PM +0800, Ming Wang wrote:
>> LoongArch's huge_pte_offset() currently returns a pointer to a PMD slot
>> even if the underlying entry points to invalid_pte_table (indicating no
>> mapping). Callers like smaps_hugetlb_range() fetch this invalid entry
>> value (the address of invalid_pte_table) via this pointer.
>
> So it's in smaps_hugetlb_range() context, then..
The root cause involves several steps:
1. When the hugetlbfs file is mapped (MAP_PRIVATE), the initial PMD
(or relevant level) entry is often populated by the kernel during mmap()
with a non-present entry pointing to the architecture's invalid_pte_table
On the affected LoongArch system, this address was observed to
be 0x90000000031e4000.
2. The smaps walker (walk_hugetlb_range -> smaps_hugetlb_range) reads
this entry.
3. The generic is_swap_pte() macro checks `!pte_present() && !pte_none()`.
The entry (invalid_pte_table address) is not present. Crucially,
the generic pte_none() check (`!(pte_val(pte) & ~_PAGE_GLOBAL)`)
returns false because the invalid_pte_table address is non-zero.
Therefore, is_swap_pte() incorrectly returns true.
4. The code enters the `else if (is_swap_pte(...))` block.
5. Inside this block, it checks `is_pfn_swap_entry()`. Due to a bit
pattern coincidence in the invalid_pte_table address on LoongArch,
the embedded generic `is_migration_entry()` check happens to return
true (misinterpreting parts of the address as a migration type).
6. This leads to a call to pfn_swap_entry_to_page() with the bogus
swap entry derived from the invalid table address.
7. pfn_swap_entry_to_page() extracts a meaningless PFN, finds an
unrelated struct page, checks its lock status (unlocked), and hits
the `BUG_ON(is_migration_entry(entry) && !PageLocked(p))` assertion.
>
>>
>> The generic is_swap_pte() check then incorrectly identifies this address
>> as a swap entry on LoongArch, because it satisfies the !pte_present()
>> && !pte_none() conditions. This misinterpretation, combined with a
>> coincidental match by is_migration_entry() on the address bits, leads
>> to kernel crashes in pfn_swap_entry_to_page().
>
> .. you mentioned !pte_none() is checked. Could you explain why it's
> pte_none() rather than huge_pte_none()? As I saw loongarch has exactly
> this..
>
> static inline int huge_pte_none(pte_t pte)
> {
> unsigned long val = pte_val(pte) & ~_PAGE_GLOBAL;
> return !val || (val == (unsigned long)invalid_pte_table);
> }
>
> I'm not familiar with loongarch's invalid_pte_table, but I would expect at
> least for now in hugetlb specific paths it should be reported as none even
> without this patch.
>
> One more thing to mention is the kernel (AFAIU) is trying to moving away
> from hugetlb specific pgtable parsing to generic pgtable walkers. It means
> it could happen at some point where the kernel walks the hugetlb pgtables
> using normal pgtable APIs. I'm not yet sure what would happen then, but
> maybe at some point the invalid_pte_table check is needed in pte_none() too
> for loongarch.
>
> Thanks,
You asked why the check involves pte_none() rather than huge_pte_none(), given that LoongArch
provides the latter which correctly identifies the invalid_pte_table address.
That's a great question, and the crux seems to be in how the generic code path works. The crash
originates within smaps_hugetlb_range() after the generic `is_swap_pte(ptent)` macro returns true.
Looking at the definition of `is_swap_pte()` (in include/linux/mm.h or similar), it typically
expands to `!pte_present(pte) && !pte_none(pte)`.
Critically, even though `smaps_hugetlb_range()` deals with HugeTLB entries (often PMDs cast to pte_t),
the generic `is_swap_pte()` macro itself, when expanded, calls the **generic `pte_none()` macro**, not
the specialized `huge_pte_none()`.
LoongArch's generic `pte_none()` macro is defined as:
`#define pte_none(pte) (!(pte_val(pte) & ~_PAGE_GLOBAL))`
This definition does *not* check for the `invalid_pte_table` address and thus returns false for it,
leading to `is_swap_pte()` incorrectly returning true.
So, while LoongArch does provide `huge_pte_none()` which *could* correctly identify the state, it's not
actually invoked in the code path triggered by `is_swap_pte()` within `smaps_hugetlb_range()`.
This is why modifying `huge_pte_offset()` seems necessary and appropriate at the architecture level.
By returning NULL when the underlying PMD entry is none (checked using the correct `pmd_none()`, which *does*
check for invalid_pte_table on LoongArch), we prevent the invalid pointer and its problematic value from reaching
`smaps_hugetlb_range()` and subsequently fooling the generic `is_swap_pte()` check that uses the generic `pte_none()`.
Regarding your point about generic page table walkers possibly needing `pte_none()` itself to handle `invalid_pte_table`
in the future – I understand the concern. That might indeed be a separate, future enhancement needed for LoongArch's
generic page table support. However, the current patch addresses the immediate crash within the existing hugetlb-specific
walker (`smaps_hugetlb_range`) by stopping the problematic value at the source (`huge_pte_offset`), which seems like a
necessary and correct fix for the present issue.
Does this explanation clarify the interaction between the generic macros and the arch-specific helpers in this context?
Thanks,
Ming
>
>>
>> Fix this at the architecture level by modifying huge_pte_offset() to
>> check the PMD entry's content using pmd_none() before returning. If the
>> entry is none (i.e., it points to invalid_pte_table), return NULL
>> instead of the pointer to the slot.
>>
>> Co-developed-by: Hongchen Zhang <zhanghongchen@...ngson.cn>
>> Signed-off-by: Hongchen Zhang <zhanghongchen@...ngson.cn>
>> Signed-off-by: Huacai Chen <chenhuacai@...ngson.cn>
>> Signed-off-by: Ming Wang <wangming01@...ngson.cn>
>> ---
>> arch/loongarch/mm/hugetlbpage.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/loongarch/mm/hugetlbpage.c b/arch/loongarch/mm/hugetlbpage.c
>> index e4068906143b..cea84d7f2b91 100644
>> --- a/arch/loongarch/mm/hugetlbpage.c
>> +++ b/arch/loongarch/mm/hugetlbpage.c
>> @@ -47,7 +47,7 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr,
>> pmd = pmd_offset(pud, addr);
>> }
>> }
>> - return (pte_t *) pmd;
>> + return pmd_none(pmdp_get(pmd)) ? NULL : (pte_t *) pmd;
>> }
>>
>> uint64_t pmd_to_entrylo(unsigned long pmd_val)
>> --
>> 2.43.0
>>
>
Powered by blists - more mailing lists