[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240718131428.GA21243@willie-the-truck>
Date: Thu, 18 Jul 2024 14:14:28 +0100
From: Will Deacon <will@...nel.org>
To: Asahi Lina <lina@...hilina.net>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org, asahi@...ts.linux.dev,
linux-arm-kernel@...ts.infradead.org,
Catalin Marinas <catalin.marinas@....com>, ryan.roberts@....com,
mark.rutland@....com, ardb@...nel.org
Subject: Re: LPA2 on non-LPA2 hardware broken with 16K pages
Hi Lina, [+Ard, Mark and Ryan],
On Thu, Jul 18, 2024 at 06:39:10PM +0900, Asahi Lina wrote:
> I ran into this with the Asahi Linux downstream kernel, based on v6.9.9,
> but I believe the problem is also still upstream. The issue seems to be
> an interaction between folding one page table level at compile time and
> another one at runtime.
Thanks for reporting this!
> With this config, we have:
>
> CONFIG_PGTABLE_LEVELS=4
> PAGE_SHIFT=14
> PMD_SHIFT=25
> PUD_SHIFT=36
> PGDIR_SHIFT=47
> pgtable_l5_enabled() == false (compile time)
> pgtable_l4_enabled() == false (runtime, due to no LPA2)
I think this is 'defconfig' w/ 16k pages, although I wasn't able to
trigger the issue quickly under QEMU with that. Your analysis looks
correct, however.
> With p4d folded at compile-time, and pud folded at runtime when LPA2 is
> not supported.
>
> With this setup, pgd_offset() is broken since the pgd is actually
> supposed to become a pud but the shift is wrong, as it is set at compile
> time:
>
> #define pgd_index(a) (((a) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1))
>
> static inline pgd_t *pgd_offset_pgd(pgd_t *pgd, unsigned long address)
> {
> return (pgd + pgd_index(address));
> };
>
> Then we follow the gup logic (abbreviated):
>
> gup_pgd_range:
> pgdp = pgd_offset(current->mm, addr);
> pgd_t pgd = READ_ONCE(*pgdp);
>
> At this point, pgd is just the 0th entry of the top level page table
> (since those extra address bits will always be 0 for valid 47-bit user
> addresses).
>
> p4d then gets folded via pgtable-nop4d.h:
>
> gup_p4d_range:
> p4dp = p4d_offset_lockless(pgdp, pgd, addr);
> = p4d_offset(&(pgd), address)
> = &pgd
> p4d_t p4d = READ_ONCE(*p4dp);
>
> Now we have p4dp = stack address of pgd, and p4d = pgd.
>
> gup_pud_range:
> pudp = pud_offset_lockless(p4dp, p4d, addr);
> -> if (!pgtable_l4_enabled())
> = p4d_to_folded_pud(p4dp, addr);
> = (pud_t *)PTR_ALIGN_DOWN(p4dp, PAGE_SIZE) + pud_index(addr);
> pud_t pud = READ_ONCE(*pudp);
>
> Which is bad pointer math because it only works if p4dp points to a real
> page table entry inside a page table, not a single u64 stack address.
Cheers for the explanation; I agree that 6.10 looks like it's affected
in the same way, even though I couldn't reproduce the crash. I think the
root of the problem is that p4d_offset_lockless() returns a stack
address when the p4d level is folded. I wondered about changing the
dummy pXd_offset_lockless() definitions in linux/pgtable.h to pass the
real pointer through instead of the address of the local, but then I
suppose _most_ pXd_offset() implementations are going to dereference
that and it would break the whole point of having _lockless routines
to start with.
What if we provided our own implementation of p4d_offset_lockless()
for the folding case, which could just propagate the page-table pointer?
Diff below.
> This causes random oopses in internal_get_user_pages_fast and related
> codepaths.
Do you have a reliable way to trigger those? I tried doing some GUPpy
things like strace (access_process_vm()) but it all seemed fine.
Thanks,
Will
--->8
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index f8efbc128446..3afe624a39e1 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -1065,6 +1065,13 @@ static inline bool pgtable_l5_enabled(void) { return false; }
#define p4d_offset_kimg(dir,addr) ((p4d_t *)dir)
+static inline
+p4d_t *p4d_offset_lockless(pgd_t *pgdp, pgd_t pgd, unsigned long addr)
+{
+ return p4d_offset(pgdp, addr);
+}
+#define p4d_offset_lockless p4d_offset_lockless
+
#endif /* CONFIG_PGTABLE_LEVELS > 4 */
#define pgd_ERROR(e) \
Powered by blists - more mailing lists