[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c50091bdbb0556ee74ec501381f1efc14a4e5929.camel@physik.fu-berlin.de>
Date: Tue, 12 Aug 2025 14:32:00 +0200
From: John Paul Adrian Glaubitz <glaubitz@...sik.fu-berlin.de>
To: Anthony Yznaga <anthony.yznaga@...cle.com>, sparclinux@...r.kernel.org,
davem@...emloft.net, andreas@...sler.com
Cc: linux-kernel@...r.kernel.org, agordeev@...ux.ibm.com, will@...nel.org,
ryan.roberts@....com, david@...hat.com, osalvador@...e.de, Meelis Roos
<mroos@...ux.ee>
Subject: Found it - was: Re: [PATCH] sparc64: fix hugetlb for sun4u
Hi Anthony,
On Mon, 2025-08-11 at 12:44 +0200, John Paul Adrian Glaubitz wrote:
> Hi,
>
> On Mon, 2025-08-11 at 00:20 +0200, John Paul Adrian Glaubitz wrote:
> > Hi,
> >
> > On Sun, 2025-08-10 at 11:52 +0200, John Paul Adrian Glaubitz wrote:
> > > On Sat, 2025-08-09 at 08:42 +0200, John Paul Adrian Glaubitz wrote:
> > > > Let me know if you have more suggestions to test. I can also provide you with full
> > > > access to this Netra 240 if you send me your public SSH key in a private mail.
> > >
> > > I have narrowed it down to a regression between v6.3 and v6.4 now.
> > >
> > > The bug can be reproduced with the sparc64_defconfig on a Sun Netra 240 by setting
> > > CONFIG_TRANSPARENT_HUGEPAGE=y and CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y. When testing
> > > on a modern systemd-based distribution, it's also necessary to enable CGroup support
> > > as well as enable support for Sun partition tables with CONFIG_SUN_PARTITION=y.
> > >
> > > Then it should be a matter of bisecting the commits between v6.3 and v6.4.
> > >
> > > I will do that within the next days as I'm currently a bit busy with other stuff.
> >
> > OK, it turns out it's reproducible on older kernels (but not as old as 4.19) as well.
> > It's just much harder to trigger. I found a reproducer though and will try to find
> > the problematic commit next.
> >
> > [50686.808389] BUG: Bad page map in process sshd-session pte:00000002 pmd:01448000
> > [50686.905701] addr:00000100000a0000 vm_flags:00000075 anon_vma:0000000000000000 mapping:fff000003c8ca4f8 index:50
> > [50687.038425] file:sshd-session fault:filemap_fault mmap:ext4_file_mmap [ext4] read_folio:ext4_read_folio [ext4]
> > [50687.170246] CPU: 0 PID: 37883 Comm: sshd-session Not tainted 6.3.0-2-sparc64 #1 Debian 6.3.11-1
> > [50687.285751] Call Trace:
> > [50687.317771] [<0000000000d660b0>] dump_stack+0x8/0x18
> > [50687.382976] [<000000000064fd1c>] print_bad_pte+0x15c/0x200
> > [50687.455024] [<0000000000650f84>] unmap_page_range+0x3e4/0xbe0
> > [50687.530513] [<0000000000651cd8>] unmap_vmas+0xf8/0x1a0
> > [50687.597993] [<000000000065e674>] exit_mmap+0xb4/0x360
> > [50687.664331] [<00000000004647dc>] __mmput+0x3c/0x120
> > [50687.728380] [<00000000004648f4>] mmput+0x34/0x60
> > [50687.788999] [<000000000046b510>] do_exit+0x250/0xa00
> > [50687.854194] [<000000000046bea4>] do_group_exit+0x24/0xa0
> > [50687.923962] [<000000000046bf3c>] sys_exit_group+0x1c/0x40
> > [50687.994875] [<0000000000406174>] linux_sparc_syscall+0x34/0x44
> > [50688.071518] Disabling lock debugging due to kernel taint
> > [50689.484196] Unable to handle kernel paging request at virtual address 000c000002400000
> > [50689.588368] tsk->{mm,active_mm}->context = 00000000001815a6
> > [50689.661677] tsk->{mm,active_mm}->pgd = fff000000ae60000
> > [50689.730374] \|/ ____ \|/
> > "@'/ .. \`@"
> > /_| \__/ |_\
> > \__U_/
> > [50689.923679] sshd-session(37883): Oops [#1]
> > [50689.977420] CPU: 0 PID: 37883 Comm: sshd-session Tainted: G B 6.3.0-2-sparc64 #1 Debian 6.3.11-1
> > [50690.112384] TSTATE: 0000008811001607 TPC: 00000000006510cc TNPC: 00000000006510d0 Y: 00000000 Tainted: G B
> > [50690.261089] TPC: <unmap_page_range+0x52c/0xbe0>
> > [50690.320650] g0: 00000000000004a8 g1: 000c000000000000 g2: 0000000000008800 g3: ffffffffffffffff
> > [50690.435029] g4: fff0000001ef1280 g5: 0000000031200000 g6: fff0000001f04000 g7: ffffffffffffffff
> > [50690.549403] o0: 000c000002400a20 o1: 00000100000a4000 o2: 0000000100048290 o3: 0000000000000000
> > [50690.663779] o4: 0000000000000001 o5: 000000000000000d sp: fff0000001f06f61 ret_pc: 0000010000000000
> > [50690.782728] RPC: <0x10000000000>
> > [50690.825039] l0: 0000000100048290 l1: 000c000002400a20 l2: 00000100000a6000 l3: fff0000000950000
> > [50690.939419] l4: 00000100000fc000 l5: fff000000196dc20 l6: fff0000001f07938 l7: 00000000010f6fd0
> > [50691.053798] i0: fff0000001f07aa8 i1: 0000000000002000 i2: 00000100000a4000 i3: fff0000008311b00
> > [50691.168170] i4: 0000000000100000 i5: fff0000001448290 i6: fff0000001f07081 i7: 0000000000651cd8
> > [50691.282546] I7: <unmap_vmas+0xf8/0x1a0>
> > [50691.332867] Call Trace:
> > [50691.364891] [<0000000000651cd8>] unmap_vmas+0xf8/0x1a0
> > [50691.432371] [<000000000065e674>] exit_mmap+0xb4/0x360
> > [50691.498708] [<00000000004647dc>] __mmput+0x3c/0x120
> > [50691.562759] [<00000000004648f4>] mmput+0x34/0x60
> > [50691.623376] [<000000000046b510>] do_exit+0x250/0xa00
> > [50691.688573] [<000000000046bea4>] do_group_exit+0x24/0xa0
> > [50691.758340] [<000000000046bf3c>] sys_exit_group+0x1c/0x40
> > [50691.829256] [<0000000000406174>] linux_sparc_syscall+0x34/0x44
> > [50691.905886] Caller[0000000000651cd8]: unmap_vmas+0xf8/0x1a0
> > [50691.979085] Caller[000000000065e674]: exit_mmap+0xb4/0x360
> > [50692.051141] Caller[00000000004647dc]: __mmput+0x3c/0x120
> > [50692.120911] Caller[00000000004648f4]: mmput+0x34/0x60
> > [50692.187246] Caller[000000000046b510]: do_exit+0x250/0xa00
> > [50692.258160] Caller[000000000046bea4]: do_group_exit+0x24/0xa0
> > [50692.333645] Caller[000000000046bf3c]: sys_exit_group+0x1c/0x40
> > [50692.410280] Caller[0000000000406174]: linux_sparc_syscall+0x34/0x44
> > [50692.492629] Caller[fff0000102ad4a74]: 0xfff0000102ad4a74
> > [50692.562397] Instruction DUMP:
> > [50692.562399] ce762010
> > [50692.601281] 02f47fa8
> > [50692.632163] c4362018
> > [50692.663044] <c45c6008>
> > [50692.693926] 86100011
> > [50692.724808] 8e08a001
> > [50692.755689] 8400bfff
> > [50692.786569] 8779d402
> > [50692.817451] c458e018
> >
> > [50692.898656] Fixing recursive fault but reboot is needed!
>
> So, I was able now to even reproduce it in kernel versions as early as 5.2:
>
> [ 122.085803] Unable to handle kernel NULL pointer dereference
> [ 122.160227] tsk->{mm,active_mm}->context = 000000000000009d
> [ 122.233502] tsk->{mm,active_mm}->pgd = fff0000231d14000
> [ 122.302118] \|/ ____ \|/
> [ 122.302118] "@'/ .. \`@"
> [ 122.302118] /_| \__/ |_\
> [ 122.302118] \__U_/
> [ 122.495420] systemd(1): Oops [#1]
> [ 122.538874] CPU: 0 PID: 1 Comm: systemd Not tainted 5.2.0-3-sparc64 #1 Debian 5.2.17-1
> [ 122.642957] TSTATE: 0000004411001601 TPC: 000000000061cd94 TNPC: 000000000061cd98 Y: 00000000 Not tainted
> [ 122.772207] TPC: <vfs_getattr_nosec+0x34/0xc0>
> [ 122.830529] g0: 0000000000000000 g1: 00000000000007ff g2: 0000000000000000 g3: 00000000000007df
> [ 122.944902] g4: fff00002381771c0 g5: 0000000000000003 g6: fff0000238178000 g7: 0000000000000000
> [ 123.059275] o0: fff000023817be18 o1: 0000000000000000 o2: 0000000000000000 o3: fff000023817be18
> [ 123.173658] o4: 0000000000000000 o5: 0000000000000000 sp: fff000023817b341 ret_pc: 000000000061cd7c
> [ 123.292611] RPC: <vfs_getattr_nosec+0x1c/0xc0>
> [ 123.350933] l0: 0000010000204010 l1: fff0000101600e28 l2: e4e45b5b8ae44628 l3: 0000000000000000
> [ 123.465311] l4: 0000000000000000 l5: 0000000000000000 l6: 0000000000000000 l7: fff0000100bff140
> [ 123.579692] i0: fff000023817bd50 i1: fff000023817be18 i2: 0000000000000001 i3: 0000000000000900
> [ 123.694060] i4: 0000000000000000 i5: fff00002320c1210 i6: fff000023817b3f1 i7: 000000000061ce48
> [ 123.808439] I7: <vfs_getattr+0x28/0x40>
> [ 123.858759] Call Trace:
> [ 123.890785] [000000000061ce48] vfs_getattr+0x28/0x40
> [ 123.957123] [000000000061cf64] vfs_statx+0x84/0xc0
> [ 124.021173] [000000000061d918] sys_statx+0x38/0x60
> [ 124.085226] [0000000000406154] linux_sparc_syscall+0x34/0x44
> [ 124.160708] Disabling lock debugging due to kernel taint
> [ 124.230481] Caller[000000000061ce48]: vfs_getattr+0x28/0x40
> [ 124.303680] Caller[000000000061cf64]: vfs_statx+0x84/0xc0
> [ 124.374593] Caller[000000000061d918]: sys_statx+0x38/0x60
> [ 124.445503] Caller[0000000000406154]: linux_sparc_syscall+0x34/0x44
> [ 124.527857] Caller[fff00001013fde40]: 0xfff00001013fde40
> [ 124.597621] Instruction DUMP:
> [ 124.597623] c2264000
> [ 124.636505] 861027df
> [ 124.667386] c45f6028
> [ 124.698267] <c458a050>
> [ 124.729148] 8408a401
> [ 124.760031] 83789403
> [ 124.790910] c2264000
> [ 124.821801] c207600c
> [ 124.852675] 80886800
> [ 124.883556]
> [ 124.954015] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
> [ 125.054721] Press Stop-A (L1-A) from sun keyboard or send break
> [ 125.054721] twice on console to return to the boot prom
> [ 125.201103] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009 ]---
>
> On v5.6, I'm getting an interesting error mentioning D-cache parity errors:
>
> [ 125.743109] CPU[0]: Cheetah+ D-cache parity error at TPC[000000000056bacc]
> [ 125.833511] TPC<bpf_check+0x18ec/0x3060>
> [ 127.909612] systemd-sysv-generator[1677]: SysV service '/etc/init.d/buildd' lacks a native systemd unit file, automatically generating a unit file for compatibility.
> [ 128.104239] systemd-sysv-generator[1677]: Please update package to include a native systemd unit file.
> [ 128.227312] systemd-sysv-generator[1677]: ⚠ This compatibility logic is deprecated, expect removal soon. ⚠
> [ 129.638144] Unable to handle kernel NULL pointer dereference
> [ 129.712528] tsk->{mm,active_mm}->context = 000000000000009e
> [ 129.785808] tsk->{mm,active_mm}->pgd = fff0000233d38000
> [ 129.854522] \|/ ____ \|/
> [ 129.854522] "@'/ .. \`@"
> [ 129.854522] /_| \__/ |_\
> [ 129.854522] \__U_/
> [ 130.047827] systemd(1): Oops [#1]
> [ 130.091278] CPU: 0 PID: 1 Comm: systemd Tainted: G E 5.6.0-2-sparc64 #1 Debian 5.6.14-2
> [ 130.213664] TSTATE: 0000009911001604 TPC: 00000000005506d8 TNPC: 00000000005506dc Y: 00000000 Tainted: G E
> [ 130.361222] TPC: <bpf_prog_realloc+0x38/0xe0>
> [ 130.418486] g0: 0000000002000000 g1: 0000000000000000 g2: 0000000000000000 g3: 0000000000000002
> [ 130.532867] g4: fff000023c178000 g5: 0000000000657300 g6: fff000023c180000 g7: fff000023423e684
> [ 130.647245] o0: ffffffff00002000 o1: 0000000000000001 o2: fff0000234168fa0 o3: 0000000000000000
> [ 130.761617] o4: fff0000237761f80 o5: 0000000000000001 sp: fff000023c182fd1 ret_pc: 00000000005f2c84
> [ 130.880576] RPC: <__vfree+0x24/0x80>
> [ 130.927456] l0: ffffffffffffffff l1: 0000000000000001 l2: 0000000000000400 l3: 0000000002000000
> [ 131.041836] l4: 0000000000debc00 l5: 0000000000000100 l6: 0000000000000001 l7: fff000023c005e40
> [ 131.156213] i0: 000000010004e000 i1: 0000000000002000 i2: 0000000000100cc0 i3: fff0000237761300
> [ 131.270589] i4: 0000000000000001 i5: 0000000000000001 i6: fff000023c183081 i7: 0000000000550a70
> [ 131.384963] I7: <bpf_patch_insn_single+0x70/0x1e0>
> [ 131.447862] Call Trace:
> [ 131.479889] [0000000000550a70] bpf_patch_insn_single+0x70/0x1e0
> [ 131.558814] [000000000055fe58] bpf_patch_insn_data+0x18/0x1c0
> [ 131.635442] [000000000056bed8] bpf_check+0x1cf8/0x3060
> [ 131.704064] [0000000000559778] bpf_prog_load+0x498/0x8e0
> [ 131.774975] [0000000000559d10] __do_sys_bpf+0x150/0x1880
> [ 131.845890] [000000000055b534] sys_bpf+0x14/0x560
> [ 131.908807] [0000000000406154] linux_sparc_syscall+0x34/0x44
> [ 131.984281] Disabling lock debugging due to kernel taint
> [ 132.054051] Caller[0000000000550a70]: bpf_patch_insn_single+0x70/0x1e0
> [ 132.139832] Caller[000000000055fe58]: bpf_patch_insn_data+0x18/0x1c0
> [ 132.223325] Caller[000000000056bed8]: bpf_check+0x1cf8/0x3060
> [ 132.298811] Caller[0000000000559778]: bpf_prog_load+0x498/0x8e0
> [ 132.376588] Caller[0000000000559d10]: __do_sys_bpf+0x150/0x1880
> [ 132.454361] Caller[000000000055b534]: sys_bpf+0x14/0x560
> [ 132.524132] Caller[0000000000406154]: linux_sparc_syscall+0x34/0x44
> [ 132.606483] Caller[fff0000100995b38]: 0xfff0000100995b38
> [ 132.676247] Instruction DUMP:
> [ 132.676249] c25e2020
> [ 132.715134] 92270009
> [ 132.746014] bb326000
> [ 132.776895] <d05860e0>
> [ 132.807776] 400021a1
> [ 132.838657] 9210001d
> [ 132.869540] 80a22000
> [ 132.900422] 12400016
> [ 132.931303] 03003480
> [ 132.962184]
> [ 133.020246] systemd-journald[205]: Time jumped backwards, rotating. (Dropped 1 similar message(s))
> [ 133.138938] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
> [ 133.239608] Press Stop-A (L1-A) from sun keyboard or send break
> [ 133.239608] twice on console to return to the boot prom
> [ 133.385997] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009 ]---
>
> Searching for that error in the archives, yielded this report from 2018 [1] which seems to have never
> been addressed by David Miller.
>
> @Anthony: Can you see any suspicious in the disassembled code that Meelis (CC'ed) posted?
OK, bisecting has lead me to the following commit:
d53d2f78ceadba081fc7785570798c3c8d50a718 is the first bad commit
commit d53d2f78ceadba081fc7785570798c3c8d50a718 (HEAD)
Author: Rick Edgecombe <rick.p.edgecombe@...el.com>
Date: Thu Apr 25 17:11:38 2019 -0700
bpf: Use vmalloc special flag
Use new flag VM_FLUSH_RESET_PERMS for handling freeing of special
permissioned memory in vmalloc and remove places where memory was set RW
before freeing which is no longer needed. Don't track if the memory is RO
anymore because it is now tracked in vmalloc.
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@...el.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
Cc: <akpm@...ux-foundation.org>
Cc: <ard.biesheuvel@...aro.org>
Cc: <deneen.t.dock@...el.com>
Cc: <kernel-hardening@...ts.openwall.com>
Cc: <kristen@...ux.intel.com>
Cc: <linux_dti@...oud.com>
Cc: <will.deacon@....com>
Cc: Alexei Starovoitov <ast@...nel.org>
Cc: Andy Lutomirski <luto@...nel.org>
Cc: Borislav Petkov <bp@...en8.de>
Cc: Daniel Borkmann <daniel@...earbox.net>
Cc: Dave Hansen <dave.hansen@...ux.intel.com>
Cc: H. Peter Anvin <hpa@...or.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Nadav Amit <nadav.amit@...il.com>
Cc: Rik van Riel <riel@...riel.com>
Cc: Thomas Gleixner <tglx@...utronix.de>
Link: https://lkml.kernel.org/r/20190426001143.4983-19-namit@vmware.com
Signed-off-by: Ingo Molnar <mingo@...nel.org>
include/linux/filter.h | 17 +++--------------
kernel/bpf/core.c | 1 -
2 files changed, 3 insertions(+), 15 deletions(-)
I assume it's also related to this change:
commit 868b104d7379e28013e9d48bdd2db25e0bdcf751
Author: Rick Edgecombe <rick.p.edgecombe@...el.com>
Date: Thu Apr 25 17:11:36 2019 -0700
mm/vmalloc: Add flag for freeing of special permsissions
Add a new flag VM_FLUSH_RESET_PERMS, for enabling vfree operations to
immediately clear executable TLB entries before freeing pages, and handle
resetting permissions on the directmap. This flag is useful for any kind
of memory with elevated permissions, or where there can be related
permissions changes on the directmap. Today this is RO+X and RO memory.
Although this enables directly vfreeing non-writeable memory now,
non-writable memory cannot be freed in an interrupt because the allocation
itself is used as a node on deferred free list. So when RO memory needs to
be freed in an interrupt the code doing the vfree needs to have its own
work queue, as was the case before the deferred vfree list was added to
vmalloc.
For architectures with set_direct_map_ implementations this whole operation
can be done with one TLB flush when centralized like this. For others with
directmap permissions, currently only arm64, a backup method using
set_memory functions is used to reset the directmap. When arm64 adds
set_direct_map_ functions, this backup can be removed.
When the TLB is flushed to both remove TLB entries for the vmalloc range
mapping and the direct map permissions, the lazy purge operation could be
done to try to save a TLB flush later. However today vm_unmap_aliases
could flush a TLB range that does not include the directmap. So a helper
is added with extra parameters that can allow both the vmalloc address and
the direct mapping to be flushed during this operation. The behavior of the
normal vm_unmap_aliases function is unchanged.
Suggested-by: Dave Hansen <dave.hansen@...el.com>
Suggested-by: Andy Lutomirski <luto@...nel.org>
Suggested-by: Will Deacon <will.deacon@....com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@...el.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
Cc: <akpm@...ux-foundation.org>
Cc: <ard.biesheuvel@...aro.org>
Cc: <deneen.t.dock@...el.com>
Cc: <kernel-hardening@...ts.openwall.com>
Cc: <kristen@...ux.intel.com>
Cc: <linux_dti@...oud.com>
Cc: Borislav Petkov <bp@...en8.de>
Cc: H. Peter Anvin <hpa@...or.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Nadav Amit <nadav.amit@...il.com>
Cc: Rik van Riel <riel@...riel.com>
Cc: Thomas Gleixner <tglx@...utronix.de>
Link: https://lkml.kernel.org/r/20190426001143.4983-17-namit@vmware.com
Signed-off-by: Ingo Molnar <mingo@...nel.org>
Adrian
--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer
`. `' Physicist
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
Powered by blists - more mailing lists