[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z72lNoOdZp5_kiT7@J2N7QTR9R3>
Date: Tue, 25 Feb 2025 11:10:46 +0000
From: Mark Rutland <mark.rutland@....com>
To: Ryan Roberts <ryan.roberts@....com>
Cc: Luiz Capitulino <luizcap@...hat.com>,
LKML <linux-kernel@...r.kernel.org>, linux-mm@...ck.org,
ardb@...nel.org,
"linux-arm-kernel@...ts.infradead.org" <linux-arm-kernel@...ts.infradead.org>,
Catalin Marinas <Catalin.Marinas@....com>,
Will Deacon <will@...nel.org>
Subject: Re: kernel BUG at arch/arm64/mm/mmu.c:185!
On Tue, Feb 25, 2025 at 09:47:30AM +0000, Ryan Roberts wrote:
> (Adding arm folks for visibility)
>
> See original report here for context:
> https://lore.kernel.org/all/a3d9acbe-07c2-43b6-9ba9-a7585f770e83@redhat.com/
>
> TL;DR is that 6.14 doesn't boot on Ampere Altra when kaslr is enabled.
>
>
> On 20/02/2025 20:08, Luiz Capitulino wrote:
> > On 2025-02-19 09:40, Luiz Capitulino wrote:
> >
> >>>> Btw, I'll try to bisect again and will also try to update the system's firmware
> >>>> just in case.
> >
> > I tried to bisect it and again, got nowhere.
> >
> > Git bisect says the first bad commit is 8883957b3c9de2087fb6cf9691c1188cccf1ac9c .
> > But I'm able to boot that tree...
> >
>
> OK, think I've found the dodgy commit:
>
> Commit 62cffa496aac ("arm64/mm: Override PARange for !LPA2 and use it consistently")
>
> Based on the changes it certainly looks like it could be the issue, but I
> haven't spotted exactly what the problem is yet. Ard, could you take a look?
>
> I managed to hack multi ram bank support into kvmtool, so I can now repro the
> issue in virtualization. Then was able to bisect to get to the above commit.
If you're able to repro this, could you please say the configuration of
memory banks you're using, and could you hack the BUG() to dump more
info, e.g. something lihke the below, UNTESTED patch.
Knowing the VA will tell us whether we're spilling out of the expected VA
region otherwise going wildly wrong with addressing, and the values in the PTEs
will tell us what's specifically triggering the warning.
Also, if you're able to test with CONFIG_DEBUG_VIRTUAL, that might spot if we
have a dodgy VA->PA conversion somewhere, which can
Mark.
---->8----
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index b4df5bc5b1b8b..d04719919de33 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -171,19 +171,22 @@ static void init_pte(pte_t *ptep, unsigned long addr, unsigned long end,
{
do {
pte_t old_pte = __ptep_get(ptep);
+ pte_t new_pte = pfn_pte(__phys_to_pfn(phys), prot);
/*
- * Required barriers to make this visible to the table walker
- * are deferred to the end of alloc_init_cont_pte().
+ * After the PTE entry has been populated once, we
+ * only allow updates to the permission attributes.
*/
- __set_pte_nosync(ptep, pfn_pte(__phys_to_pfn(phys), prot));
+ if (!pgattr_change_is_safe(pte_val(old_pte), pte_val(new_pte))) {
+ panic("Unsafe PTE change @ VA:0x%016lx PA:%pa::0x%016llx -> 0x%016llx\n",
+ addr, &phys, pte_val(old_pte), pte_val(new_pte));
+ }
/*
- * After the PTE entry has been populated once, we
- * only allow updates to the permission attributes.
+ * Required barriers to make this visible to the table walker
+ * are deferred to the end of alloc_init_cont_pte().
*/
- BUG_ON(!pgattr_change_is_safe(pte_val(old_pte),
- pte_val(__ptep_get(ptep))));
+ __set_pte_nosync(ptep, pfn_pte(__phys_to_pfn(phys), prot));
phys += PAGE_SIZE;
} while (ptep++, addr += PAGE_SIZE, addr != end);
Powered by blists - more mailing lists