[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5303aeda-5c66-ede6-b3ac-7d8ebd73ec70@loongson.cn>
Date: Mon, 6 Feb 2023 18:24:53 +0800
From: Jianmin Lv <lvjianmin@...ngson.cn>
To: WANG Xuerui <kernel@...0n.name>,
Huacai Chen <chenhuacai@...ngson.cn>,
Arnd Bergmann <arnd@...db.de>,
Huacai Chen <chenhuacai@...nel.org>
Cc: loongarch@...ts.linux.dev, linux-arch@...r.kernel.org,
Xuefeng Li <lixuefeng@...ngson.cn>,
Guo Ren <guoren@...nel.org>,
Jiaxun Yang <jiaxun.yang@...goat.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] LoongArch: Make -mstrict-align be configurable
On 2023/2/2 下午6:30, WANG Xuerui wrote:
> On 2023/2/2 16:42, Huacai Chen wrote:
>> Introduce Kconfig option ARCH_STRICT_ALIGN to make -mstrict-align be
>> configurable.
>>
>> Not all LoongArch cores support h/w unaligned access, we can use the
>> -mstrict-align build parameter to prevent unaligned accesses.
>>
>> This option is disabled by default to optimise for performance, but you
>> can enabled it manually if you want to run kernel on systems without h/w
>> unaligned access support.
>
> It's customary to accompany "performance-related" changes like this with
> some benchmark numbers and concrete use cases where this would be
> profitable. Especially given that arch/loongarch developer and user base
> is relatively small, we probably don't want to allow customization of
> such a low-level characteristic. In general kernel performance does not
> vary much with compiler flags like this, so I'd really hope to see some
> numbers here to convince people that this is *really* providing gains.
>
> Also, defaulting to emitting unaligned accesses would mean those future,
> likely embedded models (and AFAIK some existing models that haven't
> reached GA yet) would lose support with the defconfig. Which means
> downstream packagers that care about those use cases would have one more
> non-default, non-generic option to carry within their Kconfig. We
> probably don't want to repeat the history of other architectures (think
> arch/arm or arch/mips) where there wasn't really generic builds and
> board-specific tweaks proliferated.
>
Hi, Xuerui
I think the kernels produced with and without -mstrict-align have mainly
following differences:
- Diffirent size. I build two kernls (vmlinux), size of kernel with
-mstrict-align is 26533376 bytes and size of kernel without
-mstrict-align is 26123280 bytes.
- Diffirent performance. For example, in kernel function jhash(), the
assemble code slices with and without -mstrict-align are following:
without -mstrict-align:
900000000032736c <jhash>:
900000000032736c: 15bd5b6d lu12i.w $t1,
-136485(0xdeadb)
9000000000327370: 03bbbdad ori $t1, $t1, 0xeef
9000000000327374: 001019ad add.w $t1, $t1, $a2
9000000000327378: 001015ae add.w $t2, $t1, $a1
900000000032737c: 0280300c addi.w $t0, $zero,
12(0xc)
9000000000327380: 00150091 move $t5, $a0
9000000000327384: 001501d0 move $t4, $t2
9000000000327388: 001501c4 move $a0, $t2
900000000032738c: 6c009585 bgeu $t0, $a1,
148(0x94) # 9000000000327420 <jhash+0xb4>
9000000000327390: 02803012 addi.w $t6, $zero,
12(0xc)
9000000000327394: 24000a2f ldptr.w $t3, $t5, 8(0x8)
9000000000327398: 2400022d ldptr.w $t1, $t5, 0
900000000032739c: 2400062c ldptr.w $t0, $t5, 4(0x4)
90000000003273a0: 001011e4 add.w $a0, $t3, $a0
90000000003273a4: 001111af sub.w $t3, $t1, $a0
90000000003273a8: 001039ef add.w $t3, $t3, $t2
90000000003273ac: 004cf08e rotri.w $t2, $a0, 0x1c
90000000003273b0: 0010418c add.w $t0, $t0, $t4
...
with -mstrict-align:
90000000003310c0 <jhash>:
90000000003310c0: 15bd5b6f lu12i.w $t3,
-136485(0xdeadb)
90000000003310c4: 03bbbdef ori $t3, $t3, 0xeef
90000000003310c8: 001019ef add.w $t3, $t3, $a2
90000000003310cc: 001015e6 add.w $a2, $t3, $a1
90000000003310d0: 0280300d addi.w $t1, $zero, 12(0xc)
90000000003310d4: 0015008c move $t0, $a0
90000000003310d8: 001500d2 move $t6, $a2
90000000003310dc: 001500c4 move $a0, $a2
90000000003310e0: 6c0101a5 bgeu $t1, $a1,
256(0x100) # 90000000003311e0 <jhash+0x120>
90000000003310e4: 02803011 addi.w $t5, $zero, 12(0xc)
90000000003310e8: 2a002589 ld.bu $a5, $t0, 9(0x9)
90000000003310ec: 2a00218d ld.bu $t1, $t0, 8(0x8)
90000000003310f0: 2a002988 ld.bu $a4, $t0, 10(0xa)
90000000003310f4: 2a000587 ld.bu $a3, $t0, 1(0x1)
90000000003310f8: 2a002d8e ld.bu $t2, $t0, 11(0xb)
90000000003310fc: 2a00018b ld.bu $a7, $t0, 0
9000000000331100: 2a000994 ld.bu $t8, $t0, 2(0x2)
9000000000331104: 2a001593 ld.bu $t7, $t0, 5(0x5)
9000000000331108: 2a000d8f ld.bu $t3, $t0, 3(0x3)
900000000033110c: 00412129 slli.d $a5, $a5, 0x8
9000000000331110: 2a00118a ld.bu $a6, $t0, 4(0x4)
9000000000331114: 2a001990 ld.bu $t4, $t0, 6(0x6)
9000000000331118: 00153529 or $a5, $a5, $t1
...
It seems that it's difficult for me to test the performance difference
in a real kernel path with unaligned-access code. So, I use a kernel
module (use simple test code) to show some difference on 3A5000 as
following:
c code:
preempt_disable();
start = ktime_get_ns();
for (i = 0; i < n; i++)
assign(p1[i], q1[i]);
end = ktime_get_ns();
preempt_enable();
printk("mstrict-align-test took: %lld nsec\n", end - start);
assemble code without -mstrict-align:
0: 260000ac ldptr.d $t0, $a1, 0
4: 2700008c stptr.d $t0, $a0, 0
8: 4c000020 jirl $zero, $ra, 0
assemble code with -mstrict-align:
0: 2a0000b3 ld.bu $t7, $a1, 0
4: 2a0004b2 ld.bu $t6, $a1, 1(0x1)
8: 2a0008b1 ld.bu $t5, $a1, 2(0x2)
c: 2a000cb0 ld.bu $t4, $a1, 3(0x3)
10: 2a0010af ld.bu $t3, $a1, 4(0x4)
14: 2a0014ae ld.bu $t2, $a1, 5(0x5)
18: 2a0018ad ld.bu $t1, $a1, 6(0x6)
1c: 2a001cac ld.bu $t0, $a1, 7(0x7)
20: 29000093 st.b $t7, $a0, 0
24: 29000492 st.b $t6, $a0, 1(0x1)
28: 29000891 st.b $t5, $a0, 2(0x2)
2c: 29000c90 st.b $t4, $a0, 3(0x3)
30: 2900108f st.b $t3, $a0, 4(0x4)
34: 2900148e st.b $t2, $a0, 5(0x5)
38: 2900188d st.b $t1, $a0, 6(0x6)
3c: 29001c8c st.b $t0, $a0, 7(0x7)
40: 4c000020 jirl $zero, $ra, 0
and test results (run 3 times) following:
the module without -mstrict-align testing:
[root@...nEuler loongson]# insmod align-test.ko
[ 39.029931] mstrict-align-test took: 29603510 nsec
[root@...nEuler loongson]# rmmod align-test.ko
[root@...nEuler loongson]# insmod align-test.ko
[ 41.356007] mstrict-align-test took: 28816710 nsec
[root@...nEuler loongson]# rmmod align-test.ko
[root@...nEuler loongson]# insmod align-test.ko
[ 43.506624] mstrict-align-test took: 30030700 nsec
[root@...nEuler loongson]# rmmod align-test.ko
the module with -mstrict-align testing:
root@...nEuler ~]# insmod align-test.ko
[ 92.656477] mstrict-align-test took: 59629000 nsec
[root@...nEuler ~]# rmmod align-test.ko
[root@...nEuler ~]# insmod align-test.ko
[ 99.473011] mstrict-align-test took: 58972250 nsec
[root@...nEuler ~]# rmmod align-test.ko
[root@...nEuler ~]# insmod align-test.ko
[ 104.620103] mstrict-align-test took: 59419260 nsec
[root@...nEuler ~]# rmmod align-test.ko
Thanks!
Jianmin
>>
>> Signed-off-by: Huacai Chen <chenhuacai@...ngson.cn>
>> ---
>> arch/loongarch/Kconfig | 10 ++++++++++
>> arch/loongarch/Makefile | 2 ++
>> 2 files changed, 12 insertions(+)
>>
>> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
>> index 9cc8b84f7eb0..7470dcfb32f0 100644
>> --- a/arch/loongarch/Kconfig
>> +++ b/arch/loongarch/Kconfig
>> @@ -441,6 +441,16 @@ config ARCH_IOREMAP
>> protection support. However, you can enable LoongArch DMW-based
>> ioremap() for better performance.
>> +config ARCH_STRICT_ALIGN
>> + bool "Enable -mstrict-align to prevent unaligned accesses"
>> + help
>> + Not all LoongArch cores support h/w unaligned access, we can use
>> + -mstrict-align build parameter to prevent unaligned accesses.
>> +
>> + This is disabled by default to optimise for performance, you can
>> + enabled it manually if you want to run kernel on systems without
>> + h/w unaligned access support.
>> +
>> config KEXEC
>> bool "Kexec system call"
>> select KEXEC_CORE
>> diff --git a/arch/loongarch/Makefile b/arch/loongarch/Makefile
>> index 4402387d2755..ccfb52700237 100644
>> --- a/arch/loongarch/Makefile
>> +++ b/arch/loongarch/Makefile
>> @@ -91,10 +91,12 @@ KBUILD_CPPFLAGS += -DVMLINUX_LOAD_ADDRESS=$(load-y)
>> # instead of .eh_frame so we don't discard them.
>> KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
>> +ifdef CONFIG_ARCH_STRICT_ALIGN
>> # Don't emit unaligned accesses.
>> # Not all LoongArch cores support unaligned access, and as kernel we
>> can't
>> # rely on others to provide emulation for these accesses.
>> KBUILD_CFLAGS += $(call cc-option,-mstrict-align)
>> +endif >
>> KBUILD_CFLAGS += -isystem $(shell $(CC) -print-file-name=include)
>
Powered by blists - more mailing lists