[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250103161637.GA3921@willie-the-truck>
Date: Fri, 3 Jan 2025 16:16:38 +0000
From: Will Deacon <will@...nel.org>
To: Itai Handler <itai.handler@...il.com>
Cc: linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
mark.rutland@....com, ardb@...nel.org, usamaarif642@...il.com
Subject: Re: Issues with kexec on arm64
On Tue, Dec 24, 2024 at 01:36:41PM +0200, Itai Handler wrote:
> [Sorry about my previous e-mail on this subject. It got corrupted.
> Please ignore it.]
>
> Hello,
>
> I'm encountering kernel panics / system hangs when attempting to
> kexec a vmlinux file on arm64 architecture.
>
> It happens both on qemu and on real hardware.
>
> These issues occur on all kernels from v4.19 to the latest mainline.
I think other folks have been using kexec on arm64, so something smells
fishy here. Is the issue intermittent?
> A sample panic output looks as follows:
> kernel BUG at arch/arm64/mm/mmu.c:217!
> Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
> CPU: 0 PID: 0 Comm: swapper Not tainted 6.6.0 #292
> Hardware name: linux,dummy-virt (DT)
> pstate: 800000c5 (Nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : __create_pgd_mapping+0xe8/0x3b0
> lr : __create_pgd_mapping+0x44/0x3b0
> sp : fffffe00804d3c20
> x29: fffffe00804d3c20 x28: fffffe0080620000 x27: fffffffefdbc0000
> x26: fffffe0080300000 x25: 0000000040010000 x24: fffffffefdbc8020
> x23: fffffe0080010000 x22: 0000000000000040 x21: fffffe0080010000
> x20: fffffe0080300000 x19: 0040000000000783 x18: 0000000000000000
> x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
> x14: fffffffefdde0000 x13: fffffe00804d3c78 x12: 0000000000001d68
> x11: 0000000000001d64 x10: fffffe00804d3c2c x9 : fffffffefdde0000
> x8 : 0000000040420000 x7 : 0000000000001d68 x6 : 0000000000000000
> x5 : fffffe00a0010000 x4 : 0000000000001004 x3 : fffffe0480010000
> x2 : fffffe00804f7ec0 x1 : 0000000000000000 x0 : 0000000000000000
> Call trace:
> __create_pgd_mapping+0xe8/0x3b0
> map_kernel_segment+0x74/0xb0
> paging_init+0xec/0x4f8
> setup_arch+0x234/0x52c
> start_kernel+0x64/0x500
> __primary_switched+0xb4/0xbc
> Code: f9400300 92400400 f1000c1f 54000060 (d4210000)
> ---[ end trace 0000000000000000 ]---
> Kernel panic - not syncing: Oops - BUG: Fatal exception
So this explodes because we find a page-table entry at the pmd level
that we don't like the look of:
- It's not a block entry
- It's not all zeroes
- It's also not a table
Sadly, the actual value is clobbered by the time we take the BUG():
0: f9400300 ldr x0, [x24]
4: 92400400 and x0, x0, #0x3
8: f1000c1f cmp x0, #0x3
c: 54000060 b.eq 0x18 // b.none
10:* d4210000 brk #0x800 <-- trapping instruction
Maybe dumping 'pmd_val(pmd)' before we crash would be instructive? Maybe
it's a pointer...
> I bisected those panics to 8eb7e28d4c642c310f25c18f80a44dd4b01c694e
> ("arm64/mm: move runtime pgds to rodata"), which was added on v4.19.
Hmm. I wonder if the rodata section isn't being loaded properly? Can you
add some traces to check that, please?
Will
Powered by blists - more mailing lists