[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <33c3dbaa-05e5-cb75-dd35-d05bf02fea2e@loongson.cn>
Date: Sun, 28 Sep 2025 16:39:12 +0800
From: Tiezhu Yang <yangtiezhu@...ngson.cn>
To: Huacai Chen <chenhuacai@...nel.org>
Cc: WANG Rui <wangrui@...ngson.cn>, loongarch@...ts.linux.dev,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1] LoongArch: Add
-fno-isolate-erroneous-paths-dereference in Makefile
On 2025/9/23 下午10:32, Huacai Chen wrote:
> Hi, Tiezhu,
>
> On Tue, Sep 23, 2025 at 2:17 PM Tiezhu Yang <yangtiezhu@...ngson.cn> wrote:
>>
>> Currently, when compiling with GCC, there is no "break 0x7" instruction
>> for zero division due to using the option -mno-check-zero-division, but
>> the compiler still generates "break 0x0" instruction for zero division.
>>
>> Here is a simple example:
>>
>> $ cat test.c
>> int div(int a)
>> {
>> return a / 0;
>> }
>> $ gcc -O2 -S test.c -o test.s
>>
>> GCC generates "break 0" On LoongArch and "ud2" on x86, objtool decodes
>> "ud2" as INSN_BUG for x86, so decode "break 0" as INSN_BUG can fix the
>> objtool warnings for LoongArch, but this is not the intention.
>>
>> When decoding "break 0" as INSN_TRAP in the previous commit, the aim is
>> to handle "break 0" as a trap. The generated "break 0" for zero division
>> by GCC is not proper, it should generate a break instruction with proper
>> bug type, so add the GCC option -fno-isolate-erroneous-paths-dereference
>> to avoid generating the unexpected "break 0" instruction for now.
> You said that this patch make performance increase a little. But this
> is strange, because -isolate-erroneous-paths-dereference rather than
> -no-isolate-erroneous-paths-dereference is considered as an
> optimization.
I tested linux 6.17-rc7 with loongson3_defconfig, only a
little improvement (about 0.3%) with "./Run -c 1".
Here are the test steps, anyone who is interested can test
again to get the actual results on the specified environment:
git clone https://github.com/kdlucas/byte-unixbench.git
cd byte-unixbench/UnixBench/
make
./Run -c 1
./Run -c 8
Here are the objdump info for sched_update_scaling() in
kernel/sched/fair.o:
Before:
000000000000bbc8 <sched_update_scaling>:
bbc8: 1a00000c pcalau12i $t0, 0
bbcc: 1a00000d pcalau12i $t1, 0
bbd0: 02c0018c addi.d $t0, $t0, 0
bbd4: 288001ae ld.w $t2, $t1, 0
bbd8: 24000190 ldptr.w $t4, $t0, 0
bbdc: 004081ce slli.w $t2, $t2, 0x0
bbe0: 40006200 beqz $t4, 96 # bc40
<sched_update_scaling+0x78>
bbe4: 0280200d addi.w $t1, $zero, 8
bbe8: 0012b9ad sltu $t1, $t1, $t2
bbec: 02802012 addi.w $t6, $zero, 8
bbf0: 0013b5cf masknez $t3, $t2, $t1
bbf4: 0013364d maskeqz $t1, $t6, $t1
bbf8: 001535ed or $t1, $t3, $t1
bbfc: 02800811 addi.w $t5, $zero, 2
bc00: 004081af slli.w $t3, $t1, 0x0
bc04: 58001611 beq $t4, $t5, 20 # bc18
<sched_update_scaling+0x50>
bc08: 400051c0 beqz $t2, 80 # bc58
<sched_update_scaling+0x90>
bc0c: 000015ad clz.w $t1, $t1
bc10: 0280800f addi.w $t3, $zero, 32
bc14: 001135ef sub.w $t3, $t3, $t1
bc18: 24000d8d ldptr.w $t1, $t0, 12
bc1c: 00150004 or $a0, $zero, $zero
bc20: 00213dad div.wu $t1, $t1, $t3
bc24: 2980418d st.w $t1, $t0, 16
bc28: 4c000020 jirl $zero, $ra, 0
bc2c: 03400000 andi $zero, $zero, 0x0
bc30: 03400000 andi $zero, $zero, 0x0
bc34: 03400000 andi $zero, $zero, 0x0
bc38: 03400000 andi $zero, $zero, 0x0
bc3c: 03400000 andi $zero, $zero, 0x0
bc40: 24000d8d ldptr.w $t1, $t0, 12
bc44: 0280040f addi.w $t3, $zero, 1
bc48: 00150004 or $a0, $zero, $zero
bc4c: 00213dad div.wu $t1, $t1, $t3
bc50: 2980418d st.w $t1, $t0, 16
bc54: 4c000020 jirl $zero, $ra, 0
bc58: 002a0000 break 0x0
bc5c: 03400000 andi $zero, $zero, 0x0
After:
000000000000bbc8 <sched_update_scaling>:
bbc8: 1a00000c pcalau12i $t0, 0
bbcc: 1a00000d pcalau12i $t1, 0
bbd0: 02c0018c addi.d $t0, $t0, 0
bbd4: 288001ae ld.w $t2, $t1, 0
bbd8: 24000190 ldptr.w $t4, $t0, 0
bbdc: 0280040f addi.w $t3, $zero, 1
bbe0: 004081ce slli.w $t2, $t2, 0x0
bbe4: 40003a00 beqz $t4, 56 # bc1c
<sched_update_scaling+0x54>
bbe8: 0280200d addi.w $t1, $zero, 8
bbec: 0012b9ad sltu $t1, $t1, $t2
bbf0: 02802012 addi.w $t6, $zero, 8
bbf4: 0013b5cf masknez $t3, $t2, $t1
bbf8: 0013364d maskeqz $t1, $t6, $t1
bbfc: 001535ed or $t1, $t3, $t1
bc00: 02800811 addi.w $t5, $zero, 2
bc04: 004081af slli.w $t3, $t1, 0x0
bc08: 58001611 beq $t4, $t5, 20 # bc1c
<sched_update_scaling+0x54>
bc0c: 000015ad clz.w $t1, $t1
bc10: 0280800f addi.w $t3, $zero, 32
bc14: 001135ef sub.w $t3, $t3, $t1
bc18: 001339ef maskeqz $t3, $t3, $t2
bc1c: 24000d8d ldptr.w $t1, $t0, 12
bc20: 00150004 or $a0, $zero, $zero
bc24: 00213dad div.wu $t1, $t1, $t3
bc28: 2980418d st.w $t1, $t0, 16
bc2c: 4c000020 jirl $zero, $ra, 0
There is no beqz instruction for zero division with this patch,
I guess it will affect the performance to some extent. IMO, the
isolate-erroneous-paths-dereference optimization is for error
code path, not for performance.
Anyway, my initial aim is to check whether exist performance
regression, from the point of view of the test results, there
is no obvious differences with this patch.
Thanks,
Tiezhu
Powered by blists - more mailing lists