[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <d782be07-1cfa-626f-d9f5-d151bd091214@loongson.cn>
Date: Sat, 25 Jun 2022 10:09:59 +0800
From: Tiezhu Yang <yangtiezhu@...ngson.cn>
To: Huacai Chen <chenhuacai@...nel.org>
Cc: WANG Xuerui <kernel@...0n.name>,
Xuefeng Li <lixuefeng@...ngson.cn>,
Jianmin Lv <lvjianmin@...ngson.cn>, Jun Yi <yijun@...ngson.cn>,
Rui Wang <wangrui@...ngson.cn>,
LKML <linux-kernel@...r.kernel.org>,
Jiaxun Yang <jiaxun.yang@...goat.com>,
loongarch@...ts.linux.dev, Arnd Bergmann <arnd@...db.de>,
Guo Ren <guoren@...nel.org>
Subject: Re: [PATCH v2 2/2] LoongArch: No need to call RESTORE_ALL_AND_RET for
all syscalls
Cc loongarch@...ts.linux.dev
Arnd Bergmann <arnd@...db.de>
Guo Ren <guoren@...nel.org>
On 06/23/2022 08:43 AM, Tiezhu Yang wrote:
> Cc Jiaxun Yang <jiaxun.yang@...goat.com>
>
> On 06/22/2022 06:01 PM, Huacai Chen wrote:
>> Hi, Tiezhu,
>>
>> On Tue, Jun 21, 2022 at 6:08 PM Tiezhu Yang <yangtiezhu@...ngson.cn>
>> wrote:
>>>
>>> In handle_syscall, it is unnecessary to call RESTORE_ALL_AND_RET
>>> for all syscalls.
>>>
>>> (1) rt_sigreturn call RESTORE_ALL_AND_RET.
>>> (2) The other syscalls call RESTORE_STATIC_SOME_SP_AND_RET.
>>>
>>> This patch only adds the minimal changes as simple as possible
>>> to reduce the code complexity, at the same time, it can reduce
>>> many load instructions.
>>>
>>> Here are the test environments:
>>>
>>> Hardware: Loongson-LS3A5000-7A1000-1w-A2101
>>> Firmware: UDK2018-LoongArch-A2101-pre-beta8 [1]
>>> System: loongarch64-clfs-system-5.0 [2]
>>>
>>> The system passed functional testing used with the following
>>> test case without and with this patch:
>>>
>>> git clone https://github.com/hevz/sigaction-test.git
>>> cd sigaction-test
>>> make check
>>>
>>> Additionally, use UnixBench syscall to test the performance:
>>>
>>> git clone https://github.com/kdlucas/byte-unixbench.git
>>> cd byte-unixbench/UnixBench/
>>> make
>>> pgms/syscall 600
>>>
>>> In order to avoid the performance impact, add init=/bin/bash
>>> to the boot cmdline.
>>>
>>> Here is the test result, the bigger the better, it shows about
>>> 1.2% gain tested with close, getpid and exec [3]:
>>>
>>> duration without_this_patch with_this_patch
>>> 600 s 626558267 lps 634244079 lps
>>>
>>> [1] https://github.com/loongson/Firmware/tree/main/5000Series/PC/A2101
>>> [2]
>>> https://github.com/sunhaiyong1978/CLFS-for-LoongArch/releases/tag/5.0
>>> [3]
>>> https://github.com/kdlucas/byte-unixbench/blob/master/UnixBench/src/syscall.c
>>>
>> I test your patch and the whole UnixBench result is like this:
>>
>> Before patch, single thread:
>>
>> System Benchmarks Index Values BASELINE RESULT
>> INDEX
>> Dhrystone 2 using register variables 116700.0 9235787.7
>> 791.4
>> Double-Precision Whetstone 55.0 2758.7
>> 501.6
>> Execl Throughput 43.0 2386.8
>> 555.1
>> File Copy 1024 bufsize 2000 maxblocks 3960.0 191752.0
>> 484.2
>> File Copy 256 bufsize 500 maxblocks 1655.0 78737.9
>> 475.8
>> File Copy 4096 bufsize 8000 maxblocks 5800.0 297402.5
>> 512.8
>> Pipe Throughput 12440.0 353658.1
>> 284.3
>> Pipe-based Context Switching 4000.0 120140.8
>> 300.4
>> Process Creation 126.0 5735.0
>> 455.2
>> Shell Scripts (1 concurrent) 42.4 2701.5
>> 637.1
>> Shell Scripts (8 concurrent) 6.0 894.9
>> 1491.5
>> System Call Overhead 15000.0 557467.4
>> 371.6
>>
>> ========
>> System Benchmarks Index Score
>> 516.1
>>
>> After patch, single thread:
>>
>> System Benchmarks Index Values BASELINE RESULT
>> INDEX
>> Dhrystone 2 using register variables 116700.0 9235688.9
>> 791.4
>> Double-Precision Whetstone 55.0 2758.7
>> 501.6
>> Execl Throughput 43.0 2377.8
>> 553.0
>> File Copy 1024 bufsize 2000 maxblocks 3960.0 192545.5
>> 486.2
>> File Copy 256 bufsize 500 maxblocks 1655.0 79735.0
>> 481.8
>> File Copy 4096 bufsize 8000 maxblocks 5800.0 299621.9
>> 516.6
>> Pipe Throughput 12440.0 354969.1
>> 285.3
>> Pipe-based Context Switching 4000.0 118307.5
>> 295.8
>> Process Creation 126.0 5757.0
>> 456.9
>> Shell Scripts (1 concurrent) 42.4 2695.4
>> 635.7
>> Shell Scripts (8 concurrent) 6.0 894.4
>> 1490.6
>> System Call Overhead 15000.0 563582.7
>> 375.7
>>
>> ========
>> System Benchmarks Index Score
>> 517.0
>>
>> Before patch, multi-threads:
>>
>> System Benchmarks Index Values BASELINE RESULT
>> INDEX
>> Dhrystone 2 using register variables 116700.0 36943633.4
>> 3165.7
>> Double-Precision Whetstone 55.0 11035.8
>> 2006.5
>> Execl Throughput 43.0 8800.1
>> 2046.5
>> File Copy 1024 bufsize 2000 maxblocks 3960.0 277638.3
>> 701.1
>> File Copy 256 bufsize 500 maxblocks 1655.0 92530.5
>> 559.1
>> File Copy 4096 bufsize 8000 maxblocks 5800.0 524344.3
>> 904.0
>> Pipe Throughput 12440.0 1359237.2
>> 1092.6
>> Pipe-based Context Switching 4000.0 571511.4
>> 1428.8
>> Process Creation 126.0 20823.3
>> 1652.6
>> Shell Scripts (1 concurrent) 42.4 6883.9
>> 1623.6
>> Shell Scripts (8 concurrent) 6.0 981.7
>> 1636.1
>> System Call Overhead 15000.0 2029539.8
>> 1353.0
>>
>> ========
>> System Benchmarks Index Score
>> 1367.4
>>
>> After patch, multi-threads:
>>
>> System Benchmarks Index Values BASELINE RESULT
>> INDEX
>> Dhrystone 2 using register variables 116700.0 36943793.6
>> 3165.7
>> Double-Precision Whetstone 55.0 11035.5
>> 2006.4
>> Execl Throughput 43.0 8768.3
>> 2039.1
>> File Copy 1024 bufsize 2000 maxblocks 3960.0 277962.9
>> 701.9
>> File Copy 256 bufsize 500 maxblocks 1655.0 92059.7
>> 556.3
>> File Copy 4096 bufsize 8000 maxblocks 5800.0 525937.5
>> 906.8
>> Pipe Throughput 12440.0 1361566.6
>> 1094.5
>> Pipe-based Context Switching 4000.0 575835.4
>> 1439.6
>> Process Creation 126.0 20426.4
>> 1621.1
>> Shell Scripts (1 concurrent) 42.4 6877.5
>> 1622.0
>> Shell Scripts (8 concurrent) 6.0 980.3
>> 1633.8
>> System Call Overhead 15000.0 2049771.6
>> 1366.5
>>
>> ========
>> System Benchmarks Index Score
>> 1366.6
>>
>> From my point of view, the benefit is negligible.
>
> There is another way to look at what is going on.
> This patch is related with syscall, I prefer to
> observe "System Call Overhead" in the test results.
>
> Here are the INDEX of "System Call Overhead" in your test results:
>
> thread before_patch after_patch gain
> single 371.6 375.7 1.103%
> multi 1353.0 1366.5 0.998%
>
> For now, I would like to wait for other people's review.
> If the conclusion is the optimization is meaningless,
> I am fine with ignoring this patch.
Any comments will be much appreciated.
Here is the link:
https://lore.kernel.org/lkml/1655806074-17454-3-git-send-email-yangtiezhu@loongson.cn/
Thanks,
Tiezhu
>
> Thanks,
> Tiezhu
>
>>
>>
>> Huacai
>>
>>>
>>> Signed-off-by: Tiezhu Yang <yangtiezhu@...ngson.cn>
>>> ---
>>> arch/loongarch/include/asm/stackframe.h | 5 +++++
>>> arch/loongarch/kernel/entry.S | 15 +++++++++++++++
>>> 2 files changed, 20 insertions(+)
>>>
>>> diff --git a/arch/loongarch/include/asm/stackframe.h
>>> b/arch/loongarch/include/asm/stackframe.h
>>> index 4ca9530..551ab8f 100644
>>> --- a/arch/loongarch/include/asm/stackframe.h
>>> +++ b/arch/loongarch/include/asm/stackframe.h
>>> @@ -216,4 +216,9 @@
>>> RESTORE_SP_AND_RET \docfi
>>> .endm
>>>
>>> + .macro RESTORE_STATIC_SOME_SP_AND_RET docfi=0
>>> + RESTORE_STATIC \docfi
>>> + RESTORE_SOME \docfi
>>> + RESTORE_SP_AND_RET \docfi
>>> + .endm
>>> #endif /* _ASM_STACKFRAME_H */
>>> diff --git a/arch/loongarch/kernel/entry.S
>>> b/arch/loongarch/kernel/entry.S
>>> index d5b3dbc..c764c99 100644
>>> --- a/arch/loongarch/kernel/entry.S
>>> +++ b/arch/loongarch/kernel/entry.S
>>> @@ -14,6 +14,7 @@
>>> #include <asm/regdef.h>
>>> #include <asm/stackframe.h>
>>> #include <asm/thread_info.h>
>>> +#include <asm/unistd.h>
>>>
>>> .text
>>> .cfi_sections .debug_frame
>>> @@ -62,9 +63,23 @@ SYM_FUNC_START(handle_syscall)
>>> li.d tp, ~_THREAD_MASK
>>> and tp, tp, sp
>>>
>>> + /* Syscall number held in a7, we can store it in TI_SYSCALL. */
>>> + LONG_S a7, tp, TI_SYSCALL
>>> +
>>> move a0, sp
>>> bl do_syscall
>>>
>>> + /*
>>> + * Syscall number held in a7 which is stored in TI_SYSCALL.
>>> + * rt_sigreturn call RESTORE_ALL_AND_RET.
>>> + * The other syscalls call RESTORE_STATIC_SOME_SP_AND_RET.
>>> + */
>>> + LONG_L t3, tp, TI_SYSCALL
>>> + li.w t4, __NR_rt_sigreturn
>>> + beq t3, t4, 1f
>>> +
>>> + RESTORE_STATIC_SOME_SP_AND_RET
>>> +1:
>>> RESTORE_ALL_AND_RET
>>> SYM_FUNC_END(handle_syscall)
>>>
>>> --
>>> 2.1.0
>>>
Powered by blists - more mailing lists