linux-kernel - Re: [PATCH v4 2/5] sparc: fix accurate exception reporting in copy_{from_to}

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fec617e3-8955-42c6-9cca-588e86833998@oracle.com>
Date: Mon, 8 Sep 2025 15:47:06 -0700
From: Anthony Yznaga <anthony.yznaga@...cle.com>
To: John Paul Adrian Glaubitz <glaubitz@...sik.fu-berlin.de>,
        Michael Karcher <kernel@...rcher.dialup.fu-berlin.de>,
        Andreas Larsson <andreas@...sler.com>
Cc: sparclinux@...r.kernel.org, linux-kernel@...r.kernel.org,
        René Rebe <rene@...ctcode.com>
Subject: Re: [PATCH v4 2/5] sparc: fix accurate exception reporting in
 copy_{from_to}_user for UltraSPARC III



On 9/7/25 11:53 PM, John Paul Adrian Glaubitz wrote:
> On Mon, 2025-09-08 at 08:47 +0200, John Paul Adrian Glaubitz wrote:
>> Hi,
>>
>> On Mon, 2025-09-08 at 08:30 +0200, John Paul Adrian Glaubitz wrote:
>>> Hi,
>>>
>>> On Sun, 2025-09-07 at 23:31 +0200, John Paul Adrian Glaubitz wrote:
>>>> Hi,
>>>>
>>>> On Sun, 2025-09-07 at 20:33 +0200, John Paul Adrian Glaubitz wrote:
>>>>> I assume that cheetah_patch_cachetlbops has to be invoked on UltraSPARC III
>>>>> since there is other code depending on it. On the other hand, the TLB code
>>>>> on UltraSPARC III was heavily overhauled in 2016 [1] which was also followed
>>>>> by a bug fix [2].
>>>>>
>>>>> Chances are there are still bugs in the code introduced in [1].
>>>>>
>>>>>> [1] https://github.com/torvalds/linux/commit/a74ad5e660a9ee1d071665e7e8ad822784a2dc7f
>>>>>> [2] https://github.com/torvalds/linux/commit/d3c976c14ad8af421134c428b0a89ff8dd3bd8f8
>>>>
>>>> I have reverted both commits. The machine boots until it tries to start
>>>> systemd when it locks up. So, I guess if there is a bug in the TLB code
>>>> it needs to be diagnosed differently.
>>>
>>> Another test with a kernel source rebased to 6.17-rc5+, with the following patch applied
>>> by Anthony Yznaga and CONFIG_SMP disabled:
>>>
>>> diff --git a/arch/sparc/mm/ultra.S b/arch/sparc/mm/ultra.S
>>> index 70e658d107e0..b323db303de1 100644
>>> --- a/arch/sparc/mm/ultra.S
>>> +++ b/arch/sparc/mm/ultra.S
>>> @@ -347,6 +347,7 @@ __cheetah_flush_tlb_kernel_range:	/* 31 insns */
>>>    	membar		#Sync
>>>    	stxa		%g0, [%o4] ASI_IMMU_DEMAP
>>>    	membar		#Sync
>>> +	flush
>>>    	retl
>>>    	 nop
>>>    	nop
>>> @@ -355,7 +356,6 @@ __cheetah_flush_tlb_kernel_range:	/* 31 insns */
>>>    	nop
>>>    	nop
>>>    	nop
>>> -	nop
>>>
>>>    #ifdef DCACHE_ALIASING_POSSIBLE
>>>    __cheetah_flush_dcache_page: /* 11 insns */
>>>
>>> Still crashes:
>>>
>>> [  139.236744] tsk->{mm,active_mm}->context = 00000000000000ab
>>> [  139.310042] tsk->{mm,active_mm}->pgd = fff0000007db8000
>>> [  139.378747]               \|/ ____ \|/
>>> [  139.378747]               "@'/ .. \`@"
>>> [  139.378747]               /_| \__/ |_\
>>> [  139.378747]                  \__U_/
>>> [  139.572059] systemd(1): Oops [#1]
>>> [  139.615613] CPU: 0 UID: 0 PID: 1 Comm: systemd Not tainted 6.17.0-rc5+ #19 NONE
>>> [  139.712832] TSTATE: 0000004411001602 TPC: 00000000005e29e4 TNPC: 00000000005e29e8 Y: 00000000    Not tainted
>>> [  139.842076] TPC: <bpf_patch_insn_data+0x204/0x2e0>
>>> [  139.905077] g0: ffffffffffffffff g1: 0000000000000000 g2: 0000000000000065 g3: fff0000009618b28
>>> [  140.019460] g4: fff00000001f9500 g5: 0000000000657300 g6: fff000000022c000 g7: 0000000000000001
>>> [  140.133837] o0: 0000000100058000 o1: 0000000000000000 o2: 0000000000000001 o3: 0000000000000002
>>> [  140.248208] o4: fff00000045ec900 o5: 0000000000000002 sp: fff000000022f031 ret_pc: 00000000005e2998
>>> [  140.367158] RPC: <bpf_patch_insn_data+0x1b8/0x2e0>
>>> [  140.430057] l0: fff0000009618000 l1: 0000000100046048 l2: 0000000000000001 l3: 0000000100058000
>>> [  140.544437] l4: 0000000100046068 l5: 0000000000000005 l6: 0000000000000000 l7: fff000000961e128
>>> [  140.658810] i0: 0000000100046000 i1: 0000000000000004 i2: 0000000000000005 i3: 0000000000000002
>>> [  140.773189] i4: 0000000100066000 i5: fff0000009618ae8 i6: fff000000022f0e1 i7: 0000000000607a08
>>> [  140.887561] I7: <bpf_check+0x1988/0x34a0>
>>> [  140.940171] Call Trace:
>>> [  140.972191] [<0000000000607a08>] bpf_check+0x1988/0x34a0
>>> [  141.041963] [<00000000005d862c>] bpf_prog_load+0x8ec/0xc80
>>> [  141.114021] [<00000000005d9be4>] __sys_bpf+0x724/0x28a0
>>> [  141.182646] [<00000000005dc338>] sys_bpf+0x18/0x60
>>> [  141.245551] [<0000000000406174>] linux_sparc_syscall+0x34/0x44
>>> [  141.322185] Disabling lock debugging due to kernel taint
>>> [  141.391952] Caller[0000000000607a08]: bpf_check+0x1988/0x34a0
>>> [  141.467440] Caller[00000000005d862c]: bpf_prog_load+0x8ec/0xc80
>>> [  141.545212] Caller[00000000005d9be4]: __sys_bpf+0x724/0x28a0
>>> [  141.619558] Caller[00000000005dc338]: sys_bpf+0x18/0x60
>>> [  141.688179] Caller[0000000000406174]: linux_sparc_syscall+0x34/0x44
>>> [  141.770535] Caller[fff000010089b80c]: 0xfff000010089b80c
>>> [  141.840301] Instruction DUMP:
>>> [  141.840305]  326ffffa
>>> [  141.879185]  c4004000
>>> [  141.910065]  c25e2038
>>> [  141.940945] <c4006108>
>>> [  141.971827]  80a0a000
>>> [  142.002709]  04400014
>>> [  142.033589]  c25860f0
>>> [  142.064474]  8400bfff
>>> [  142.095354]  8e00606c
>>> [  142.126234]
>>> [  142.176560] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
>>> [  142.277218] Press Stop-A (L1-A) from sun keyboard or send break
>>> [  142.277218] twice on console to return to the boot prom
>>> [  142.423608] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009 ]---
>>
>> Disabling support for Transparent Huge Pages (CONFIG_THP) avoids the crash.
> 
> Sorry, the option is called CONFIG_TRANSPARENT_HUGEPAGE, of course.
> 
> My suspicion is that it's related the flushing of D-Cache handling which is enabled
> for small pages only:
> 
> https://elixir.bootlin.com/linux/v6.16.5/source/arch/sparc/mm/ultra.S#L1016
> 
> and:
> 
> https://elixir.bootlin.com/linux/v6.16.5/source/arch/sparc/include/asm/page_64.h#L9
> 
> Interestingly, while running the reproducer with CONFIG_TRANSPARENT_HUGEPAGE disabled,
> I'm also getting this kernel warning, but the kernel does not crash:
> 
> [  108.733686] CPU[0]: Cheetah+ D-cache parity error at TPC[00000000005d78b4]
> [  108.824096] TPC<bpf_prog_load+0x394/0xc80>
> 
> Could it be that we need to enable the code guarded by DCACHE_ALIASING_POSSIBLE
> unconditionally?

It's already essentially enabled unconditionally. PAGE_SHIFT will always 
be 13 on sparc64 systems.

The flushing should be happening for folios of any size. See 
flush_dcache_folio(()/flush_dcache_folio_all().

You could try setting page_poison=1 on the kernel command line to see if 
the kernel detects any freed pages being used.

Is this a different Cheetah+-based system than the one I borrowed? 
Definitely some sort of memory corruption happening, but the system I 
used seemed much more stable than this.

Anthony

> 
> Adrian
>