[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cf2b32a4-2217-4a31-b6d7-e60a9f4ef7dd@arm.com>
Date: Wed, 19 Feb 2025 14:26:44 +0000
From: Ryan Roberts <ryan.roberts@....com>
To: Luiz Capitulino <luizcap@...hat.com>, LKML
<linux-kernel@...r.kernel.org>, linux-mm@...ck.org
Cc: ardb@...nel.org
Subject: Re: kernel BUG at arch/arm64/mm/mmu.c:185!
On 19/02/2025 14:16, Luiz Capitulino wrote:
> On 2025-02-19 03:41, Ryan Roberts wrote:
>> On 19/02/2025 02:27, Luiz Capitulino wrote:
>>> Hi,
>>>
>>> I'm getting the crash below with Linus tree commit
>>> 2408a807bfc3f738850ef5ad5e3fd59d66168996 on a Ampere Mt. Jade with two sockets
>>> (backtrace below).
>>
>> Thanks for the bug report, I'll take a look this morning, but I'm off work
>> tomorrow and Friday so if I can't figure it out before end of day I won't be
>> able to look again until Monday, unless someone can pick it up in the meantime.
>
> No rush at all. Please, enjoy your time off :)
Afraid I've run out of time on this for today, so adding some details below.
I'll come back to it next week unless someone else steps in.
>
>> Anyway, is there a specific config you're compiling for? And what about kernel
>> command line args?
>
> Config is attached. The kernel command-line is:
>
> """
> ro crashkernel=1G-4G:406M,4G-64G:470M,64G-:726M rd.lvm.lv=cs_ampere-mtjade-
> altra-03/root rd.lvm.lv=cs_ampere-mtjade-altra-03/swap
> earlycon=pl011,mmio,0x100002600000
> """
>
>> Is it 100% reproducible for you?
>
> That is a good question. Right now it is (just tried again with latest Linus
> tree 6537cfb395f352782918d8ee7b7f10ba2cc3cbf2). But I do have the recollection
> that I was able to boot a bad kernel a few times.
>
> Btw, I'll try to bisect again and will also try to update the system's firmware
> just in case.
>
>> How much RAM does your system have? (I have 2
>> socket Mt. Jade with 512G; I'll try to repro on that).
>
> Mine is 512G, maybe we're lucky and it's the same system.
>
>>> It happens very early during boot. Passing 'nokaslr' in the command-line works
>>> around the issue (ie. I can boot and use the system normally). Doesn't seem to
>>> happen with 6.13. I tried bisecting it but got nowhere...
>>>
>>> [ 0.000000] ------------[ cut here ]------------
>>> [ 0.000000] kernel BUG at arch/arm64/mm/mmu.c:185!
>>
>> This is:
>>
>> /*
>> * After the PTE entry has been populated once, we
>> * only allow updates to the permission attributes.
>> */
>> BUG_ON(!pgattr_change_is_safe(pte_val(old_pte), pte_val(__ptep_get(ptep))));
>>
>> So we have a valid -> valid PTE transition where either the PFNs are changing,
>> we are trying to change permissions on a contiguous entry, we are trying to
>> transition from non-global to global, or we are trying to change other
>> explicitly disallowed bits.
>>
>>> [ 0.000000] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
>>> [ 0.000000] Modules linked in:
>>> [ 0.000000] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.14.0-rc3+ #8
>>> [ 0.000000] pstate: 400000c9 (nZcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>> [ 0.000000] pc : alloc_init_cont_pte+0x20c/0x3d0
>>> [ 0.000000] lr : alloc_init_cont_pte+0x204/0x3d0
>>> [ 0.000000] sp : ffffb45836ec78b0
>>> [ 0.000000] x29: ffffb45836ec7940 x28: ffff6fea00000000 x27: 0068000000000f07
>>> [ 0.000000] x26: ffff6fea00200000 x25: 0000400000000000 x24: ffffffffff433000
>>> [ 0.000000] x23: dfff800000000000 x22: 0000d01600000000 x21: 0068000000000f07
>>> [ 0.000000] x20: ffff6fea00000000 x19: ffff6fea00010000 x18: 00000000ae5a3fb1
>>> [ 0.000000] x17: 0000000000001114 x16: 00000000bfc60000 x15: 0000000000000200
>>> [ 0.000000] x14: 0000000000000000 x13: 1ffff68b06dd8f1c x12: 00000000f1f1f1f1
>>> [ 0.000000] x11: ffff768b06dd8f1c x10: ffffb45835a1ca38 x9 : 0000000000000000
>>> [ 0.000000] x8 : 0000000041b58ab3 x7 : 0000000000000000 x6 : 0000000000000000
>>> [ 0.000000] x5 : 006840000a861f07 x4 : 000000000000a861 x3 : 000000000000a861
>>> [ 0.000000] x2 : 006840000a861f03 x1 : 0068400000000f07 x0 : 0000000000000000
>>> [ 0.000000] Call trace:
>>> [ 0.000000] alloc_init_cont_pte+0x20c/0x3d0 (P)
>>> [ 0.000000] alloc_init_cont_pmd+0x20c/0x4d0
>>> [ 0.000000] alloc_init_pud+0x244/0x400
>>> [ 0.000000] create_kpti_ng_temp_pgd+0xf8/0x1c8
>>
>> This is an alias for __create_pgd_mapping_locked() so I suspect we are actually
>> in __map_memblock().
>>
>>> [ 0.000000] map_mem.constprop.0+0x1d8/0x3b8
>>> [ 0.000000] paging_init+0x98/0x330
>>> [ 0.000000] setup_arch+0xac/0x170
>>> [ 0.000000] start_kernel+0x74/0x3c8
>>> [ 0.000000] __primary_switched+0x8c/0xa0
>>> [ 0.000000] Code: f9400301 97ffff64 72001c1f 54fffe21 (d4210000)
>>> [ 0.000000] ---[ end trace 0000000000000000 ]---
>>> [ 0.000000] Kernel panic - not syncing: Oops - BUG: Fatal exception
>>> [ 0.000000] ---[ end Kernel panic - not syncing: Oops - BUG: Fatal
>>> exception ]---
>>>
>>
>> So I guess either we are setting a PTE entry into a table for the first time,
>> where somehow the table has not been initially cleared (very unlikely) or we are
>> trying to update the permissions of an already mapped pte. In that latter case,
>> I think we should only be remapping the kernel image portion of the linear map.
>>
>> I can't see any obvious recent changes in this area. I'll see if I can repro and
>> poke around a bit more.
>
> OK, maybe you'll be able to reproduce with the config I'm attaching.
I can reproduce _a_ panic, but it's different from the one you shared. I'm
running defconfig on Ampere Altra with 2 sockets and 512G RAM. It appears to
repro reliably as long as kaslr is enabled.
I tried reproduing on VM, but with no luck. I suspect there is something about
the physical layout of memory that provokes the bug. I tried to force the memory
layout to match Altra using kvmtool but it only supports a single physical
region currently. And merging all the regions into 1 uber region is too big and
the VMM fails. So I think we are stuck having to keep rebooting the bare metal.
The first warning is due to getting a PFN that's outside the bounds of supported
PFNs:
Loading Linux 6.14.0-rc3-00012-g2408a807bfc3 ...
Loading initial ramdisk ...
EFI stub: Booting Linux Kernel...
EFI stub: WARNING: Working around broken SetVirtualAddressMap()
EFI stub: Using DTB from configuration table
EFI stub: Exiting boot services...
[ 0.000000] Booting Linux on physical CPU 0x0000120000 [0x413fd0c1]
[ 0.000000] Linux version 6.14.0-rc3-00012-g2408a807bfc3 (tuxmake@...make) (aarch64-linux-gnu-gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT @1739817505
[ 0.000000] KASLR enabled
[ 0.000000] earlycon: pl11 at MMIO 0x0000100002600000 (options '')
[ 0.000000] printk: legacy bootconsole [pl11] enabled
[ 0.000000] efi: EFI v2.7 by American Megatrends
[ 0.000000] efi: ACPI 2.0=0xaf170000 SMBIOS 3.0=0xb2e6ff98 MEMATTR=0xa9568298 ESRT=0xa9659f98 RNG=0xaefb0018 MEMRESERVE=0xa9656918
[ 0.000000] random: crng init done
[ 0.000000] esrt: Reserving ESRT space from 0x00000000a9659f98 to 0x00000000a9659fd0.
[ 0.000000] ACPI: Early table checksum verification disabled
[ 0.000000] ACPI: RSDP 0x00000000AF170000 000024 (v02 Ampere)
[ 0.000000] ACPI: XSDT 0x00000000AF160000 0000D4 (v01 Ampere Altra 00000000 AMI 01000013)
[ 0.000000] ACPI: FACP 0x00000000AF140000 000114 (v06 Ampere Altra 00000000 INTL 20190509)
[ 0.000000] ACPI: DSDT 0x00000000AF0C0000 02F09A (v02 Ampere Jade 00000001 INTL 20200717)
[ 0.000000] ACPI: FACS 0x00000000AF1D0000 000040
[ 0.000000] ACPI: DBG2 0x00000000AF150000 00005C (v00 Ampere Altra 00000000 INTL 20190509)
[ 0.000000] ACPI: GTDT 0x00000000AF130000 000110 (v03 Ampere Altra 00000000 INTL 20190509)
[ 0.000000] ACPI: SSDT 0x00000000AF120000 00002D (v02 Ampere Altra 00000001 INTL 20190509)
[ 0.000000] ACPI: BERT 0x00000000AF110000 000030 (v01 Ampere Altra 00000001 INTL 20200717)
[ 0.000000] ACPI: EINJ 0x00000000AF100000 000150 (v01 Ampere Altra 00000001 INTL 20200717)
[ 0.000000] ACPI: SDEI 0x00000000AF0F0000 000024 (v01 Ampere Altra 00000001 INTL 20200717)
[ 0.000000] ACPI: SPMI 0x00000000AF0B0000 000041 (v05 ALASKA A M I 00000000 AMI. 00000000)
[ 0.000000] ACPI: SPMI 0x00000000AF0A0000 000041 (v05 ALASKA A M I 00000000 AMI. 00000000)
[ 0.000000] ACPI: SPMI 0x00000000AF090000 000041 (v05 ALASKA A M I 00000000 AMI. 00000000)
[ 0.000000] ACPI: FIDT 0x00000000AF080000 00009C (v01 ALASKA A M I 01072009 AMI 00010013)
[ 0.000000] ACPI: SPCR 0x00000000AF070000 000050 (v02 A M I APTIO V 01072009 AMI. 0005000F)
[ 0.000000] ACPI: PPTT 0x00000000AF050000 006E60 (v02 Ampere Altra 00000000 AMP. 01000013)
[ 0.000000] ACPI: SLIT 0x00000000AF040000 000030 (v01 Ampere Altra 00000000 AMP. 01000013)
[ 0.000000] ACPI: SRAT 0x00000000AF030000 000CF0 (v03 Ampere Altra 00000000 AMP. 01000013)
[ 0.000000] ACPI: HEST 0x00000000AF020000 000878 (v01 Ampere Altra 00000001 ARM 00000099)
[ 0.000000] ACPI: MCFG 0x00000000AF010000 0000DC (v01 Ampere Altra 00000001 AMP. 01000013)
[ 0.000000] ACPI: IORT 0x00000000AF000000 000844 (v00 Ampere Altra 00000000 AMP. 01000013)
[ 0.000000] ACPI: APIC 0x00000000AF060000 003354 (v05 Ampere Altra 00000003 AMI 01000013)
[ 0.000000] ACPI: PCCT 0x00000000AEFF0000 000ABC (v02 Ampere Altra 00000003 AMP. 01000013)
[ 0.000000] ACPI: WSMT 0x00000000AEFE0000 000028 (v01 ALASKA A M I 01072009 AMI 00010013)
[ 0.000000] ACPI: FPDT 0x00000000AEFD0000 000044 (v01 ALASKA A M I 01072009 AMI 01000013)
[ 0.000000] ACPI: SPCR: console: pl011,mmio32,0x100002600000,115200
[ 0.000000] ACPI: Use ACPI SPCR as default console: Yes
[ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x88300000-0x883fffff]
[ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x90000000-0xbfffffff]
[ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x80000000000-0x8007fffffff]
[ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x800c0000000-0x83fffffffff]
[ 0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x400000000000-0x4000bfffffff]
[ 0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x400100000000-0x403fffffffff]
[ 0.000000] NUMA: Node 0 [mem 0x88300000-0x883fffff] + [mem 0x90000000-0xbfffffff] -> [mem 0x88300000-0xbfffffff]
[ 0.000000] NUMA: Node 0 [mem 0x88300000-0xbfffffff] + [mem 0x80000000000-0x8007fffffff] -> [mem 0x88300000-0x8007fffffff]
[ 0.000000] NUMA: Node 0 [mem 0x88300000-0x8007fffffff] + [mem 0x800c0000000-0x83fffffffff] -> [mem 0x88300000-0x83fffffffff]
[ 0.000000] NUMA: Node 1 [mem 0x400000000000-0x4000bfffffff] + [mem 0x400100000000-0x403fffffffff] -> [mem 0x400000000000-0x403fffffffff]
[ 0.000000] NODE_DATA(0) allocated [mem 0x83fffffd9c0-0x83fffffffff]
[ 0.000000] NODE_DATA(1) allocated [mem 0x403fc00c19c0-0x403fc00c3fff]
[ 0.000000] ------------[ cut here ]------------
[ 0.000000] WARNING: CPU: 0 PID: 0 at mm/sparse.c:142 sparse_init+0xbc/0x49c
[ 0.000000] Modules linked in:
[ 0.000000] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.14.0-rc3-00012-g2408a807bfc3 #1
[ 0.000000] pstate: 600000c9 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 0.000000] pc : sparse_init+0xbc/0x49c
[ 0.000000] lr : sparse_init+0xd8/0x49c
[ 0.000000] sp : ffffadea4aac3c80
[ 0.000000] x29: ffffadea4aac3c80 x28: 0000000000000000 x27: 0000000000000004
[ 0.000000] x26: 00000000000107ff x25: ffffadea4b00cc00 x24: 0000000000002000
[ 0.000000] x23: 0000000000020000 x22: 00000004000c0000 x21: 0000000000000001
[ 0.000000] x20: 00000000b6780000 x19: 0000000400000000 x18: 0000000000000010
[ 0.000000] x17: 0000000000000004 x16: 0000403fc00c4000 x15: 0000000000000001
[ 0.000000] x14: 00000000c0000000 x13: 00000000000001b0 x12: 0000000000000014
[ 0.000000] x11: ffffadea4ab61e80 x10: ffffadea4b004a58 x9 : ffffadea4b004c08
[ 0.000000] x8 : 0000000400000000 x7 : 0000000000000012 x6 : 0000000000000013
[ 0.000000] x5 : 00004000c0000000 x4 : ffffadea4aac3cf4 x3 : ffffadea4aac3d00
[ 0.000000] x2 : ffffadea4aac3cf8 x1 : 00000000c0000000 x0 : 0000000000000000
[ 0.000000] Call trace:
[ 0.000000] sparse_init+0xbc/0x49c (P)
[ 0.000000] bootmem_init+0x7c/0x1d8
[ 0.000000] setup_arch+0x26c/0x5f8
[ 0.000000] start_kernel+0x70/0x73c
[ 0.000000] __primary_switched+0x88/0x90
[ 0.000000] ---[ end trace 0000000000000000 ]---
[ 0.000000] Zone ranges:
[ 0.000000] DMA [mem 0x0000000088300000-0x00000000ffffffff]
[ 0.000000] DMA32 empty
[ 0.000000] Normal [mem 0x0000000100000000-0x0000403fffffffff]
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000088300000-0x00000000883fffff]
[ 0.000000] node 0: [mem 0x0000000090000000-0x0000000091ffffff]
[ 0.000000] node 0: [mem 0x0000000092000000-0x000000009277ffff]
[ 0.000000] node 0: [mem 0x0000000092780000-0x00000000aef6efff]
[ 0.000000] node 0: [mem 0x00000000aef6f000-0x00000000aef6ffff]
[ 0.000000] node 0: [mem 0x00000000aef70000-0x00000000af1cffff]
[ 0.000000] node 0: [mem 0x00000000af1d0000-0x00000000af1effff]
[ 0.000000] node 0: [mem 0x00000000af1f0000-0x00000000b0cfffff]
[ 0.000000] node 0: [mem 0x00000000b0d00000-0x00000000b79affff]
[ 0.000000] node 0: [mem 0x00000000b79b0000-0x00000000b7aeffff]
[ 0.000000] node 0: [mem 0x00000000b7af0000-0x00000000b7fdffff]
[ 0.000000] node 0: [mem 0x00000000b7fe0000-0x00000000b8068fff]
[ 0.000000] node 0: [mem 0x00000000b8069000-0x00000000b822efff]
[ 0.000000] node 0: [mem 0x00000000b822f000-0x00000000bfc3efff]
[ 0.000000] node 0: [mem 0x00000000bfc3f000-0x00000000bfc3ffff]
[ 0.000000] node 0: [mem 0x00000000bfc40000-0x00000000bfffffff]
[ 0.000000] node 0: [mem 0x0000080000000000-0x000008007fffffff]
[ 0.000000] node 0: [mem 0x00000800c0000000-0x0000083fffffffff]
[ 0.000000] node 1: [mem 0x0000400000000000-0x00004000bfffffff]
[ 0.000000] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[ 0.000000] Mem abort info:
[ 0.000000] ESR = 0x0000000096000004
[ 0.000000] EC = 0x25: DABT (current EL), IL = 32 bits
[ 0.000000] SET = 0, FnV = 0
[ 0.000000] EA = 0, S1PTW = 0
[ 0.000000] FSC = 0x04: level 0 translation fault
[ 0.000000] Data abort info:
[ 0.000000] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[ 0.000000] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 0.000000] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 0.000000] [0000000000000008] user address but active_mm is swapper
[ 0.000000] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
[ 0.000000] Modules linked in:
[ 0.000000] CPU: 0 UID: 0 PID: 0 Comm: swapper Tainted: G W 6.14.0-rc3-00012-g2408a807bfc3 #1
[ 0.000000] Tainted: [W]=WARN
[ 0.000000] pstate: 600000c9 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 0.000000] pc : subsection_map_init+0x9c/0xe4
[ 0.000000] lr : free_area_init+0x374/0xebc
[ 0.000000] sp : ffffadea4aac3bc0
[ 0.000000] x29: ffffadea4aac3bc0 x28: 00000000a8a394dc x27: 00000000a0cc2000
[ 0.000000] x26: 00000000000c0000 x25: 0000000400000000 x24: 0000000000008000
[ 0.000000] x23: 0000000000002000 x22: ffffadea4b00cc00 x21: 0000000000000000
[ 0.000000] x20: 0000000000080017 x19: 0000000000080000 x18: 0000000000000006
[ 0.000000] x17: 0000000000000004 x16: 0000403fc00c4000 x15: ffffadea4aac3690
[ 0.000000] x14: 0000000000000000 x13: ffffadea4aae3e48 x12: 000000000000012f
[ 0.000000] x11: 0000000000000065 x10: ffffadea4ab3be48 x9 : ffffadea4aae3e48
[ 0.000000] x8 : 00000000ffffefff x7 : ffffadea4ab3be48 x6 : 80000000fffff000
[ 0.000000] x5 : 000000000000bff4 x4 : 0000000000000800 x3 : 0000000000000000
[ 0.000000] x2 : 0000000000008000 x1 : 0000000000000000 x0 : 0000000000002000
[ 0.000000] Call trace:
[ 0.000000] subsection_map_init+0x9c/0xe4 (P)
[ 0.000000] free_area_init+0x374/0xebc
[ 0.000000] bootmem_init+0x10c/0x1d8
[ 0.000000] setup_arch+0x26c/0x5f8
[ 0.000000] start_kernel+0x70/0x73c
[ 0.000000] __primary_switched+0x88/0x90
[ 0.000000] Code: f8647863 8b010061 f100007f 9a831023 (f9400460)
[ 0.000000] ---[ end trace 0000000000000000 ]---
[ 0.000000] Kernel panic - not syncing: Oops: Fatal exception
[ 0.000000] ---[ end Kernel panic - not syncing: Oops: Fatal exception ]---
Powered by blists - more mailing lists