lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a9eec6f5a51c82cd2a20a96d614cfd3095ddce88.camel@physik.fu-berlin.de>
Date: Mon, 08 Sep 2025 08:53:14 +0200
From: John Paul Adrian Glaubitz <glaubitz@...sik.fu-berlin.de>
To: Michael Karcher <kernel@...rcher.dialup.fu-berlin.de>, Andreas Larsson
	 <andreas@...sler.com>
Cc: sparclinux@...r.kernel.org, linux-kernel@...r.kernel.org, Anthony Yznaga
	 <anthony.yznaga@...cle.com>, René Rebe
 <rene@...ctcode.com>
Subject: Re: [PATCH v4 2/5] sparc: fix accurate exception reporting in
 copy_{from_to}_user for UltraSPARC III

On Mon, 2025-09-08 at 08:47 +0200, John Paul Adrian Glaubitz wrote:
> Hi,
> 
> On Mon, 2025-09-08 at 08:30 +0200, John Paul Adrian Glaubitz wrote:
> > Hi,
> > 
> > On Sun, 2025-09-07 at 23:31 +0200, John Paul Adrian Glaubitz wrote:
> > > Hi,
> > > 
> > > On Sun, 2025-09-07 at 20:33 +0200, John Paul Adrian Glaubitz wrote:
> > > > I assume that cheetah_patch_cachetlbops has to be invoked on UltraSPARC III
> > > > since there is other code depending on it. On the other hand, the TLB code
> > > > on UltraSPARC III was heavily overhauled in 2016 [1] which was also followed
> > > > by a bug fix [2].
> > > > 
> > > > Chances are there are still bugs in the code introduced in [1].
> > > > 
> > > > > [1] https://github.com/torvalds/linux/commit/a74ad5e660a9ee1d071665e7e8ad822784a2dc7f
> > > > > [2] https://github.com/torvalds/linux/commit/d3c976c14ad8af421134c428b0a89ff8dd3bd8f8
> > > 
> > > I have reverted both commits. The machine boots until it tries to start
> > > systemd when it locks up. So, I guess if there is a bug in the TLB code
> > > it needs to be diagnosed differently.
> > 
> > Another test with a kernel source rebased to 6.17-rc5+, with the following patch applied
> > by Anthony Yznaga and CONFIG_SMP disabled:
> > 
> > diff --git a/arch/sparc/mm/ultra.S b/arch/sparc/mm/ultra.S
> > index 70e658d107e0..b323db303de1 100644
> > --- a/arch/sparc/mm/ultra.S
> > +++ b/arch/sparc/mm/ultra.S
> > @@ -347,6 +347,7 @@ __cheetah_flush_tlb_kernel_range:	/* 31 insns */
> >   	membar		#Sync
> >   	stxa		%g0, [%o4] ASI_IMMU_DEMAP
> >   	membar		#Sync
> > +	flush
> >   	retl
> >   	 nop
> >   	nop
> > @@ -355,7 +356,6 @@ __cheetah_flush_tlb_kernel_range:	/* 31 insns */
> >   	nop
> >   	nop
> >   	nop
> > -	nop
> > 
> >   #ifdef DCACHE_ALIASING_POSSIBLE
> >   __cheetah_flush_dcache_page: /* 11 insns */
> > 
> > Still crashes:
> > 
> > [  139.236744] tsk->{mm,active_mm}->context = 00000000000000ab
> > [  139.310042] tsk->{mm,active_mm}->pgd = fff0000007db8000
> > [  139.378747]               \|/ ____ \|/
> > [  139.378747]               "@'/ .. \`@"
> > [  139.378747]               /_| \__/ |_\
> > [  139.378747]                  \__U_/
> > [  139.572059] systemd(1): Oops [#1]
> > [  139.615613] CPU: 0 UID: 0 PID: 1 Comm: systemd Not tainted 6.17.0-rc5+ #19 NONE 
> > [  139.712832] TSTATE: 0000004411001602 TPC: 00000000005e29e4 TNPC: 00000000005e29e8 Y: 00000000    Not tainted
> > [  139.842076] TPC: <bpf_patch_insn_data+0x204/0x2e0>
> > [  139.905077] g0: ffffffffffffffff g1: 0000000000000000 g2: 0000000000000065 g3: fff0000009618b28
> > [  140.019460] g4: fff00000001f9500 g5: 0000000000657300 g6: fff000000022c000 g7: 0000000000000001
> > [  140.133837] o0: 0000000100058000 o1: 0000000000000000 o2: 0000000000000001 o3: 0000000000000002
> > [  140.248208] o4: fff00000045ec900 o5: 0000000000000002 sp: fff000000022f031 ret_pc: 00000000005e2998
> > [  140.367158] RPC: <bpf_patch_insn_data+0x1b8/0x2e0>
> > [  140.430057] l0: fff0000009618000 l1: 0000000100046048 l2: 0000000000000001 l3: 0000000100058000
> > [  140.544437] l4: 0000000100046068 l5: 0000000000000005 l6: 0000000000000000 l7: fff000000961e128
> > [  140.658810] i0: 0000000100046000 i1: 0000000000000004 i2: 0000000000000005 i3: 0000000000000002
> > [  140.773189] i4: 0000000100066000 i5: fff0000009618ae8 i6: fff000000022f0e1 i7: 0000000000607a08
> > [  140.887561] I7: <bpf_check+0x1988/0x34a0>
> > [  140.940171] Call Trace:
> > [  140.972191] [<0000000000607a08>] bpf_check+0x1988/0x34a0
> > [  141.041963] [<00000000005d862c>] bpf_prog_load+0x8ec/0xc80
> > [  141.114021] [<00000000005d9be4>] __sys_bpf+0x724/0x28a0
> > [  141.182646] [<00000000005dc338>] sys_bpf+0x18/0x60
> > [  141.245551] [<0000000000406174>] linux_sparc_syscall+0x34/0x44
> > [  141.322185] Disabling lock debugging due to kernel taint
> > [  141.391952] Caller[0000000000607a08]: bpf_check+0x1988/0x34a0
> > [  141.467440] Caller[00000000005d862c]: bpf_prog_load+0x8ec/0xc80
> > [  141.545212] Caller[00000000005d9be4]: __sys_bpf+0x724/0x28a0
> > [  141.619558] Caller[00000000005dc338]: sys_bpf+0x18/0x60
> > [  141.688179] Caller[0000000000406174]: linux_sparc_syscall+0x34/0x44
> > [  141.770535] Caller[fff000010089b80c]: 0xfff000010089b80c
> > [  141.840301] Instruction DUMP:
> > [  141.840305]  326ffffa 
> > [  141.879185]  c4004000 
> > [  141.910065]  c25e2038 
> > [  141.940945] <c4006108>
> > [  141.971827]  80a0a000 
> > [  142.002709]  04400014 
> > [  142.033589]  c25860f0 
> > [  142.064474]  8400bfff 
> > [  142.095354]  8e00606c 
> > [  142.126234] 
> > [  142.176560] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
> > [  142.277218] Press Stop-A (L1-A) from sun keyboard or send break
> > [  142.277218] twice on console to return to the boot prom
> > [  142.423608] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009 ]---
> 
> Disabling support for Transparent Huge Pages (CONFIG_THP) avoids the crash.

Sorry, the option is called CONFIG_TRANSPARENT_HUGEPAGE, of course.

My suspicion is that it's related the flushing of D-Cache handling which is enabled
for small pages only:

https://elixir.bootlin.com/linux/v6.16.5/source/arch/sparc/mm/ultra.S#L1016

and:

https://elixir.bootlin.com/linux/v6.16.5/source/arch/sparc/include/asm/page_64.h#L9

Interestingly, while running the reproducer with CONFIG_TRANSPARENT_HUGEPAGE disabled,
I'm also getting this kernel warning, but the kernel does not crash:

[  108.733686] CPU[0]: Cheetah+ D-cache parity error at TPC[00000000005d78b4]
[  108.824096] TPC<bpf_prog_load+0x394/0xc80>

Could it be that we need to enable the code guarded by DCACHE_ALIASING_POSSIBLE
unconditionally?

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer
`. `'   Physicist
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ