[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFpOueQa2EOSC7=5xsWU85B0FG8xsDoqk7btMQCEV7wBdT1GQw@mail.gmail.com>
Date: Sun, 5 Jan 2025 16:46:42 +0200
From: Itai Handler <itai.handler@...il.com>
To: Will Deacon <will@...nel.org>
Cc: linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
mark.rutland@....com, ardb@...nel.org, usamaarif642@...il.com
Subject: Re: Issues with kexec on arm64
On Fri, Jan 3, 2025 at 6:16 PM Will Deacon <will@...nel.org> wrote:
>
> On Tue, Dec 24, 2024 at 01:36:41PM +0200, Itai Handler wrote:
> > [Sorry about my previous e-mail on this subject. It got corrupted.
> > Please ignore it.]
> >
> > Hello,
> >
> > I'm encountering kernel panics / system hangs when attempting to
> > kexec a vmlinux file on arm64 architecture.
> >
> > It happens both on qemu and on real hardware.
> >
> > These issues occur on all kernels from v4.19 to the latest mainline.
>
> I think other folks have been using kexec on arm64, so something smells
> fishy here. Is the issue intermittent?
No, it isn't intermittent. It's very easy to reproduce the panics/hangs.
At most we need to perform two recursive kexec attempts of the vmlinux file.
In v6.6, using the configuration I supplied (config.sh), a single kexec
attempt is sufficient to demonstrate the issue. In that case a panic occurs
on the first kexec attempt. In newer versions I mostly see hangs but sometimes
panics as well.
Please note that the configuration I supplied sets CONFIG_ARM64_64K_PAGES=y.
But I saw issues also with 4K pages, but in that case only when enabling some
debug options (KASAN, SCHED_DEBUG, KCSAN).Also please note that kexec with the
Image file (instead of the vmlinux file) seems to work properly, without any
issue.
>
> > A sample panic output looks as follows:
> > kernel BUG at arch/arm64/mm/mmu.c:217!
> > Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
> > CPU: 0 PID: 0 Comm: swapper Not tainted 6.6.0 #292
> > Hardware name: linux,dummy-virt (DT)
> > pstate: 800000c5 (Nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > pc : __create_pgd_mapping+0xe8/0x3b0
> > lr : __create_pgd_mapping+0x44/0x3b0
> > sp : fffffe00804d3c20
> > x29: fffffe00804d3c20 x28: fffffe0080620000 x27: fffffffefdbc0000
> > x26: fffffe0080300000 x25: 0000000040010000 x24: fffffffefdbc8020
> > x23: fffffe0080010000 x22: 0000000000000040 x21: fffffe0080010000
> > x20: fffffe0080300000 x19: 0040000000000783 x18: 0000000000000000
> > x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
> > x14: fffffffefdde0000 x13: fffffe00804d3c78 x12: 0000000000001d68
> > x11: 0000000000001d64 x10: fffffe00804d3c2c x9 : fffffffefdde0000
> > x8 : 0000000040420000 x7 : 0000000000001d68 x6 : 0000000000000000
> > x5 : fffffe00a0010000 x4 : 0000000000001004 x3 : fffffe0480010000
> > x2 : fffffe00804f7ec0 x1 : 0000000000000000 x0 : 0000000000000000
> > Call trace:
> > __create_pgd_mapping+0xe8/0x3b0
> > map_kernel_segment+0x74/0xb0
> > paging_init+0xec/0x4f8
> > setup_arch+0x234/0x52c
> > start_kernel+0x64/0x500
> > __primary_switched+0xb4/0xbc
> > Code: f9400300 92400400 f1000c1f 54000060 (d4210000)
> > ---[ end trace 0000000000000000 ]---
> > Kernel panic - not syncing: Oops - BUG: Fatal exception
>
> So this explodes because we find a page-table entry at the pmd level
> that we don't like the look of:
>
> - It's not a block entry
> - It's not all zeroes
> - It's also not a table
>
> Sadly, the actual value is clobbered by the time we take the BUG():
>
> 0: f9400300 ldr x0, [x24]
> 4: 92400400 and x0, x0, #0x3
> 8: f1000c1f cmp x0, #0x3
> c: 54000060 b.eq 0x18 // b.none
> 10:* d4210000 brk #0x800 <-- trapping instruction
>
> Maybe dumping 'pmd_val(pmd)' before we crash would be instructive? Maybe
> it's a pointer...
I dumped the bad pmd (on v6.6).
It's always the same value: 128000017901ca60.
>
> > I bisected those panics to 8eb7e28d4c642c310f25c18f80a44dd4b01c694e
> > ("arm64/mm: move runtime pgds to rodata"), which was added on v4.19.
>
> Hmm. I wonder if the rodata section isn't being loaded properly? Can you
> add some traces to check that, please?
Could you advise which traces are needed and how to add them?
Thanks,
Itai Handler
Powered by blists - more mailing lists