[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9f5600b3-6525-4045-ad1f-4408dfc9ce0f@redhat.com>
Date: Wed, 19 Feb 2025 09:16:15 -0500
From: Luiz Capitulino <luizcap@...hat.com>
To: Ryan Roberts <ryan.roberts@....com>, LKML <linux-kernel@...r.kernel.org>,
linux-mm@...ck.org
Cc: ardb@...nel.org
Subject: Re: kernel BUG at arch/arm64/mm/mmu.c:185!
On 2025-02-19 03:41, Ryan Roberts wrote:
> On 19/02/2025 02:27, Luiz Capitulino wrote:
>> Hi,
>>
>> I'm getting the crash below with Linus tree commit
>> 2408a807bfc3f738850ef5ad5e3fd59d66168996 on a Ampere Mt. Jade with two sockets
>> (backtrace below).
>
> Thanks for the bug report, I'll take a look this morning, but I'm off work
> tomorrow and Friday so if I can't figure it out before end of day I won't be
> able to look again until Monday, unless someone can pick it up in the meantime.
No rush at all. Please, enjoy your time off :)
> Anyway, is there a specific config you're compiling for? And what about kernel
> command line args?
Config is attached. The kernel command-line is:
"""
ro crashkernel=1G-4G:406M,4G-64G:470M,64G-:726M rd.lvm.lv=cs_ampere-mtjade-altra-03/root rd.lvm.lv=cs_ampere-mtjade-altra-03/swap earlycon=pl011,mmio,0x100002600000
"""
> Is it 100% reproducible for you?
That is a good question. Right now it is (just tried again with latest Linus
tree 6537cfb395f352782918d8ee7b7f10ba2cc3cbf2). But I do have the recollection
that I was able to boot a bad kernel a few times.
Btw, I'll try to bisect again and will also try to update the system's firmware
just in case.
> How much RAM does your system have? (I have 2
> socket Mt. Jade with 512G; I'll try to repro on that).
Mine is 512G, maybe we're lucky and it's the same system.
>> It happens very early during boot. Passing 'nokaslr' in the command-line works
>> around the issue (ie. I can boot and use the system normally). Doesn't seem to
>> happen with 6.13. I tried bisecting it but got nowhere...
>>
>> [ 0.000000] ------------[ cut here ]------------
>> [ 0.000000] kernel BUG at arch/arm64/mm/mmu.c:185!
>
> This is:
>
> /*
> * After the PTE entry has been populated once, we
> * only allow updates to the permission attributes.
> */
> BUG_ON(!pgattr_change_is_safe(pte_val(old_pte), pte_val(__ptep_get(ptep))));
>
> So we have a valid -> valid PTE transition where either the PFNs are changing,
> we are trying to change permissions on a contiguous entry, we are trying to
> transition from non-global to global, or we are trying to change other
> explicitly disallowed bits.
>
>> [ 0.000000] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
>> [ 0.000000] Modules linked in:
>> [ 0.000000] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.14.0-rc3+ #8
>> [ 0.000000] pstate: 400000c9 (nZcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> [ 0.000000] pc : alloc_init_cont_pte+0x20c/0x3d0
>> [ 0.000000] lr : alloc_init_cont_pte+0x204/0x3d0
>> [ 0.000000] sp : ffffb45836ec78b0
>> [ 0.000000] x29: ffffb45836ec7940 x28: ffff6fea00000000 x27: 0068000000000f07
>> [ 0.000000] x26: ffff6fea00200000 x25: 0000400000000000 x24: ffffffffff433000
>> [ 0.000000] x23: dfff800000000000 x22: 0000d01600000000 x21: 0068000000000f07
>> [ 0.000000] x20: ffff6fea00000000 x19: ffff6fea00010000 x18: 00000000ae5a3fb1
>> [ 0.000000] x17: 0000000000001114 x16: 00000000bfc60000 x15: 0000000000000200
>> [ 0.000000] x14: 0000000000000000 x13: 1ffff68b06dd8f1c x12: 00000000f1f1f1f1
>> [ 0.000000] x11: ffff768b06dd8f1c x10: ffffb45835a1ca38 x9 : 0000000000000000
>> [ 0.000000] x8 : 0000000041b58ab3 x7 : 0000000000000000 x6 : 0000000000000000
>> [ 0.000000] x5 : 006840000a861f07 x4 : 000000000000a861 x3 : 000000000000a861
>> [ 0.000000] x2 : 006840000a861f03 x1 : 0068400000000f07 x0 : 0000000000000000
>> [ 0.000000] Call trace:
>> [ 0.000000] alloc_init_cont_pte+0x20c/0x3d0 (P)
>> [ 0.000000] alloc_init_cont_pmd+0x20c/0x4d0
>> [ 0.000000] alloc_init_pud+0x244/0x400
>> [ 0.000000] create_kpti_ng_temp_pgd+0xf8/0x1c8
>
> This is an alias for __create_pgd_mapping_locked() so I suspect we are actually
> in __map_memblock().
>
>> [ 0.000000] map_mem.constprop.0+0x1d8/0x3b8
>> [ 0.000000] paging_init+0x98/0x330
>> [ 0.000000] setup_arch+0xac/0x170
>> [ 0.000000] start_kernel+0x74/0x3c8
>> [ 0.000000] __primary_switched+0x8c/0xa0
>> [ 0.000000] Code: f9400301 97ffff64 72001c1f 54fffe21 (d4210000)
>> [ 0.000000] ---[ end trace 0000000000000000 ]---
>> [ 0.000000] Kernel panic - not syncing: Oops - BUG: Fatal exception
>> [ 0.000000] ---[ end Kernel panic - not syncing: Oops - BUG: Fatal
>> exception ]---
>>
>
> So I guess either we are setting a PTE entry into a table for the first time,
> where somehow the table has not been initially cleared (very unlikely) or we are
> trying to update the permissions of an already mapped pte. In that latter case,
> I think we should only be remapping the kernel image portion of the linear map.
>
> I can't see any obvious recent changes in this area. I'll see if I can repro and
> poke around a bit more.
OK, maybe you'll be able to reproduce with the config I'm attaching.
View attachment "config" of type "text/plain" (226776 bytes)
Powered by blists - more mailing lists