lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <33c3dbaa-05e5-cb75-dd35-d05bf02fea2e@loongson.cn>
Date: Sun, 28 Sep 2025 16:39:12 +0800
From: Tiezhu Yang <yangtiezhu@...ngson.cn>
To: Huacai Chen <chenhuacai@...nel.org>
Cc: WANG Rui <wangrui@...ngson.cn>, loongarch@...ts.linux.dev,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1] LoongArch: Add
 -fno-isolate-erroneous-paths-dereference in Makefile

On 2025/9/23 下午10:32, Huacai Chen wrote:
> Hi, Tiezhu,
> 
> On Tue, Sep 23, 2025 at 2:17 PM Tiezhu Yang <yangtiezhu@...ngson.cn> wrote:
>>
>> Currently, when compiling with GCC, there is no "break 0x7" instruction
>> for zero division due to using the option -mno-check-zero-division, but
>> the compiler still generates "break 0x0" instruction for zero division.
>>
>> Here is a simple example:
>>
>>    $ cat test.c
>>    int div(int a)
>>    {
>>            return a / 0;
>>    }
>>    $ gcc -O2 -S test.c -o test.s
>>
>> GCC generates "break 0" On LoongArch and "ud2" on x86, objtool decodes
>> "ud2" as INSN_BUG for x86, so decode "break 0" as INSN_BUG can fix the
>> objtool warnings for LoongArch, but this is not the intention.
>>
>> When decoding "break 0" as INSN_TRAP in the previous commit, the aim is
>> to handle "break 0" as a trap. The generated "break 0" for zero division
>> by GCC is not proper, it should generate a break instruction with proper
>> bug type, so add the GCC option -fno-isolate-erroneous-paths-dereference
>> to avoid generating the unexpected "break 0" instruction for now.
> You said that this patch make performance increase a little. But this
> is strange, because -isolate-erroneous-paths-dereference rather than
> -no-isolate-erroneous-paths-dereference is considered as an
> optimization.

I tested linux 6.17-rc7 with loongson3_defconfig, only a
little improvement (about 0.3%) with "./Run -c 1".

Here are the test steps, anyone who is interested can test
again to get the actual results on the specified environment:

   git clone https://github.com/kdlucas/byte-unixbench.git
   cd byte-unixbench/UnixBench/
   make
   ./Run -c 1
   ./Run -c 8

Here are the objdump info for sched_update_scaling() in
kernel/sched/fair.o:

Before:

000000000000bbc8 <sched_update_scaling>:
     bbc8:       1a00000c        pcalau12i       $t0, 0
     bbcc:       1a00000d        pcalau12i       $t1, 0
     bbd0:       02c0018c        addi.d          $t0, $t0, 0
     bbd4:       288001ae        ld.w            $t2, $t1, 0
     bbd8:       24000190        ldptr.w         $t4, $t0, 0
     bbdc:       004081ce        slli.w          $t2, $t2, 0x0
     bbe0:       40006200        beqz            $t4, 96 # bc40 
<sched_update_scaling+0x78>
     bbe4:       0280200d        addi.w          $t1, $zero, 8
     bbe8:       0012b9ad        sltu            $t1, $t1, $t2
     bbec:       02802012        addi.w          $t6, $zero, 8
     bbf0:       0013b5cf        masknez         $t3, $t2, $t1
     bbf4:       0013364d        maskeqz         $t1, $t6, $t1
     bbf8:       001535ed        or              $t1, $t3, $t1
     bbfc:       02800811        addi.w          $t5, $zero, 2
     bc00:       004081af        slli.w          $t3, $t1, 0x0
     bc04:       58001611        beq             $t4, $t5, 20    # bc18 
<sched_update_scaling+0x50>
     bc08:       400051c0        beqz            $t2, 80 # bc58 
<sched_update_scaling+0x90>
     bc0c:       000015ad        clz.w           $t1, $t1
     bc10:       0280800f        addi.w          $t3, $zero, 32
     bc14:       001135ef        sub.w           $t3, $t3, $t1
     bc18:       24000d8d        ldptr.w         $t1, $t0, 12
     bc1c:       00150004        or              $a0, $zero, $zero
     bc20:       00213dad        div.wu          $t1, $t1, $t3
     bc24:       2980418d        st.w            $t1, $t0, 16
     bc28:       4c000020        jirl            $zero, $ra, 0
     bc2c:       03400000        andi            $zero, $zero, 0x0
     bc30:       03400000        andi            $zero, $zero, 0x0
     bc34:       03400000        andi            $zero, $zero, 0x0
     bc38:       03400000        andi            $zero, $zero, 0x0
     bc3c:       03400000        andi            $zero, $zero, 0x0
     bc40:       24000d8d        ldptr.w         $t1, $t0, 12
     bc44:       0280040f        addi.w          $t3, $zero, 1
     bc48:       00150004        or              $a0, $zero, $zero
     bc4c:       00213dad        div.wu          $t1, $t1, $t3
     bc50:       2980418d        st.w            $t1, $t0, 16
     bc54:       4c000020        jirl            $zero, $ra, 0
     bc58:       002a0000        break           0x0
     bc5c:       03400000        andi            $zero, $zero, 0x0

After:

000000000000bbc8 <sched_update_scaling>:
     bbc8:       1a00000c        pcalau12i       $t0, 0
     bbcc:       1a00000d        pcalau12i       $t1, 0
     bbd0:       02c0018c        addi.d          $t0, $t0, 0
     bbd4:       288001ae        ld.w            $t2, $t1, 0
     bbd8:       24000190        ldptr.w         $t4, $t0, 0
     bbdc:       0280040f        addi.w          $t3, $zero, 1
     bbe0:       004081ce        slli.w          $t2, $t2, 0x0
     bbe4:       40003a00        beqz            $t4, 56 # bc1c 
<sched_update_scaling+0x54>
     bbe8:       0280200d        addi.w          $t1, $zero, 8
     bbec:       0012b9ad        sltu            $t1, $t1, $t2
     bbf0:       02802012        addi.w          $t6, $zero, 8
     bbf4:       0013b5cf        masknez         $t3, $t2, $t1
     bbf8:       0013364d        maskeqz         $t1, $t6, $t1
     bbfc:       001535ed        or              $t1, $t3, $t1
     bc00:       02800811        addi.w          $t5, $zero, 2
     bc04:       004081af        slli.w          $t3, $t1, 0x0
     bc08:       58001611        beq             $t4, $t5, 20    # bc1c 
<sched_update_scaling+0x54>
     bc0c:       000015ad        clz.w           $t1, $t1
     bc10:       0280800f        addi.w          $t3, $zero, 32
     bc14:       001135ef        sub.w           $t3, $t3, $t1
     bc18:       001339ef        maskeqz         $t3, $t3, $t2
     bc1c:       24000d8d        ldptr.w         $t1, $t0, 12
     bc20:       00150004        or              $a0, $zero, $zero
     bc24:       00213dad        div.wu          $t1, $t1, $t3
     bc28:       2980418d        st.w            $t1, $t0, 16
     bc2c:       4c000020        jirl            $zero, $ra, 0

There is no beqz instruction for zero division with this patch,
I guess it will affect the performance to some extent. IMO, the
isolate-erroneous-paths-dereference optimization is for error
code path, not for performance.

Anyway, my initial aim is to check whether exist performance
regression, from the point of view of the test results, there
is no obvious differences with this patch.

Thanks,
Tiezhu


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ