linux-kernel - Re: Issues with kexec on arm64

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250103161637.GA3921@willie-the-truck>
Date: Fri, 3 Jan 2025 16:16:38 +0000
From: Will Deacon <will@...nel.org>
To: Itai Handler <itai.handler@...il.com>
Cc: linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
	mark.rutland@....com, ardb@...nel.org, usamaarif642@...il.com
Subject: Re: Issues with kexec on arm64

On Tue, Dec 24, 2024 at 01:36:41PM +0200, Itai Handler wrote:
> [Sorry about my previous e-mail on this subject. It got corrupted.
> Please ignore it.]
> 
> Hello,
> 
> I'm encountering kernel panics / system hangs when attempting to
> kexec a vmlinux file on arm64 architecture.
> 
> It happens both on qemu and on real hardware.
> 
> These issues occur on all kernels from v4.19 to the latest mainline.

I think other folks have been using kexec on arm64, so something smells
fishy here. Is the issue intermittent?

> A sample panic output looks as follows:
>   kernel BUG at arch/arm64/mm/mmu.c:217!
>   Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
>   CPU: 0 PID: 0 Comm: swapper Not tainted 6.6.0 #292
>   Hardware name: linux,dummy-virt (DT)
>   pstate: 800000c5 (Nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>   pc : __create_pgd_mapping+0xe8/0x3b0
>   lr : __create_pgd_mapping+0x44/0x3b0
>   sp : fffffe00804d3c20
>   x29: fffffe00804d3c20 x28: fffffe0080620000 x27: fffffffefdbc0000
>   x26: fffffe0080300000 x25: 0000000040010000 x24: fffffffefdbc8020
>   x23: fffffe0080010000 x22: 0000000000000040 x21: fffffe0080010000
>   x20: fffffe0080300000 x19: 0040000000000783 x18: 0000000000000000
>   x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
>   x14: fffffffefdde0000 x13: fffffe00804d3c78 x12: 0000000000001d68
>   x11: 0000000000001d64 x10: fffffe00804d3c2c x9 : fffffffefdde0000
>   x8 : 0000000040420000 x7 : 0000000000001d68 x6 : 0000000000000000
>   x5 : fffffe00a0010000 x4 : 0000000000001004 x3 : fffffe0480010000
>   x2 : fffffe00804f7ec0 x1 : 0000000000000000 x0 : 0000000000000000
>   Call trace:
>    __create_pgd_mapping+0xe8/0x3b0
>    map_kernel_segment+0x74/0xb0
>    paging_init+0xec/0x4f8
>    setup_arch+0x234/0x52c
>    start_kernel+0x64/0x500
>    __primary_switched+0xb4/0xbc
>   Code: f9400300 92400400 f1000c1f 54000060 (d4210000)
>   ---[ end trace 0000000000000000 ]---
>   Kernel panic - not syncing: Oops - BUG: Fatal exception

So this explodes because we find a page-table entry at the pmd level
that we don't like the look of:

  - It's not a block entry
  - It's not all zeroes
  - It's also not a table

Sadly, the actual value is clobbered by the time we take the BUG():

   0:	f9400300	ldr	x0, [x24]
   4:	92400400	and	x0, x0, #0x3
   8:	f1000c1f	cmp	x0, #0x3
   c:	54000060	b.eq	0x18  // b.none
  10:*	d4210000	brk	#0x800		<-- trapping instruction

Maybe dumping 'pmd_val(pmd)' before we crash would be instructive? Maybe
it's a pointer...

> I bisected those panics to 8eb7e28d4c642c310f25c18f80a44dd4b01c694e
> ("arm64/mm: move runtime pgds to rodata"), which was added on v4.19.

Hmm. I wonder if the rodata section isn't being loaded properly? Can you
add some traces to check that, please?

Will