linux-kernel - Re: Issues with kexec on arm64

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAFpOueR32hZj5=fnSQqwD7zitkaeGkzj9W_D1RW5q72RLxkmgg@mail.gmail.com>
Date: Tue, 7 Jan 2025 11:46:01 +0200
From: Itai Handler <itai.handler@...il.com>
To: Mark Rutland <mark.rutland@....com>
Cc: linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org, 
	Will Deacon <will@...nel.org>, ardb@...nel.org
Subject: Re: Issues with kexec on arm64

On Mon, Jan 6, 2025 at 4:02 PM Mark Rutland <mark.rutland@....com> wrote:
>
> On Tue, Dec 24, 2024 at 01:36:41PM +0200, Itai Handler wrote:
> > [Sorry about my previous e-mail on this subject. It got corrupted.
> > Please ignore it.]
> >
> > Hello,
>
> Hi,
>
> >
> > I'm encountering kernel panics / system hangs when attempting to
> > kexec a vmlinux file on arm64 architecture.
> >
> > It happens both on qemu and on real hardware.
> >
> > These issues occur on all kernels from v4.19 to the latest mainline.
> > A sample panic output looks as follows:
> >   kernel BUG at arch/arm64/mm/mmu.c:217!
> >   Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
> >   CPU: 0 PID: 0 Comm: swapper Not tainted 6.6.0 #292
> >   Hardware name: linux,dummy-virt (DT)
> >   pstate: 800000c5 (Nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> >   pc : __create_pgd_mapping+0xe8/0x3b0
> >   lr : __create_pgd_mapping+0x44/0x3b0
> >   sp : fffffe00804d3c20
> >   x29: fffffe00804d3c20 x28: fffffe0080620000 x27: fffffffefdbc0000
> >   x26: fffffe0080300000 x25: 0000000040010000 x24: fffffffefdbc8020
> >   x23: fffffe0080010000 x22: 0000000000000040 x21: fffffe0080010000
> >   x20: fffffe0080300000 x19: 0040000000000783 x18: 0000000000000000
> >   x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
> >   x14: fffffffefdde0000 x13: fffffe00804d3c78 x12: 0000000000001d68
> >   x11: 0000000000001d64 x10: fffffe00804d3c2c x9 : fffffffefdde0000
> >   x8 : 0000000040420000 x7 : 0000000000001d68 x6 : 0000000000000000
> >   x5 : fffffe00a0010000 x4 : 0000000000001004 x3 : fffffe0480010000
> >   x2 : fffffe00804f7ec0 x1 : 0000000000000000 x0 : 0000000000000000
> >   Call trace:
> >    __create_pgd_mapping+0xe8/0x3b0
> >    map_kernel_segment+0x74/0xb0
> >    paging_init+0xec/0x4f8
> >    setup_arch+0x234/0x52c
> >    start_kernel+0x64/0x500
> >    __primary_switched+0xb4/0xbc
> >   Code: f9400300 92400400 f1000c1f 54000060 (d4210000)
> >   ---[ end trace 0000000000000000 ]---
> >   Kernel panic - not syncing: Oops - BUG: Fatal exception
> >
> > I bisected those panics to 8eb7e28d4c642c310f25c18f80a44dd4b01c694e
> > ("arm64/mm: move runtime pgds to rodata"), which was added on v4.19.
> >
> > I also reconstructed the full call trace (by adding "noinline" to the
> > relevant functions):
> >   alloc_init_cont_pte+0x6c/0x1e0
> >   init_pmd+0x154/0x1c8
> >   alloc_init_cont_pmd+0x11c/0x174
> >   alloc_init_pud+0xc4/0x148
> >   __create_pgd_mapping+0xa8/0x130
> >   map_kernel_segment+0xc8/0x168
> >   map_kernel+0x98/0x1a8
> >   paging_init+0x7c/0x418
> >   setup_arch+0x224/0x570
> >   start_kernel+0x5c/0x4f0
> >
>
> Does your system have GICv3 and an ITS? If so, and assuming you're not
> using EFI to boot in the first place, what *might* be happening here is
> that the GIC is still using property/pending tables allocated bye the
> first kernel, and after that memory gets reallocated, the GIC writes
> back and corrupts that memory. That would be very sensitive to memory
> layout, which could explain why the bisect leads to something that
> changes that.
>
> We have a solution for that with EFI (where we can use a configuration
> table to indicate that the memory is in use), but we don't currently
> have a solution in the absence of EFI, and we should probably forbid
> kexec in that case...
>
> Mark.
>

Hi Mark,

No, both the real hardware and the qemu VM do not have GICv3.
They both have GICv2.
Is kexec supported with GICv2 (assuming I'm not using EFI to boot)?

Thanks,
Itai Handler