[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230511-75718c538818fb3e1d924f9a@orel>
Date: Thu, 11 May 2023 09:44:39 +0200
From: Andrew Jones <ajones@...tanamicro.com>
To: zhangfei <zhang_fei_0403@....com>
Cc: linux-kernel@...r.kernel.org, linux-riscv@...ts.infradead.org,
aou@...s.berkeley.edu, palmer@...belt.com,
paul.walmsley@...ive.com, conor.dooley@...rochip.com,
zhangfei@...iscas.ac.cn
Subject: Re: [PATCH v2 0/2] RISC-V: Optimize memset for data sizes less than
16 bytes
On Thu, May 11, 2023 at 09:26:04AM +0800, zhangfei wrote:
> From: zhangfei <zhangfei@...iscas.ac.cn>
>
> At present, the implementation of the memset function uses byte by byte storage
> when processing tail data or when the initial data size is less than 16 bytes.
> This approach is not efficient. Therefore, I filled head and tail with minimal
> branching. Each conditional ensures that all the subsequently used offsets are
> well-defined and in the dest region. Although this approach may result in
> redundant storage, compared to byte by byte storage, it allows storage instructions
> to be executed in parallel, reduces the number of jumps, and ultimately achieves
> performance improvement.
>
> I used the code linked below for performance testing and commented on the memset
> that calls the arm architecture in the code to ensure it runs properly on the
> risc-v platform.
>
> [1] https://github.com/ARM-software/optimized-routines/blob/master/string/bench/memset.c#L53
>
> The testing platform selected RISC-V SiFive U74.The test data is as follows:
>
> Before optimization
> ---------------------
> Random memset (bytes/ns):
> memset_call 32K:0.45 64K:0.35 128K:0.30 256K:0.28 512K:0.27 1024K:0.25 avg 0.30
>
> Medium memset (bytes/ns):
> memset_call 8B:0.18 16B:0.48 32B:0.91 64B:1.63 128B:2.71 256B:4.40 512B:5.67
> Large memset (bytes/ns):
> memset_call 1K:6.62 2K:7.02 4K:7.46 8K:7.70 16K:7.82 32K:7.63 64K:1.40
>
> After optimization
> ---------------------
> Random memset bytes/ns):
> memset_call 32K:0.46 64K:0.35 128K:0.30 256K:0.28 512K:0.27 1024K:0.25 avg 0.31
> Medium memset (bytes/ns )
> memset_call 8B:0.27 16B:0.48 32B:0.91 64B:1.64 128B:2.71 256B:4.40 512B:5.67
> Large memset (bytes/ns):
> memset_call 1K:6.62 2K:7.02 4K:7.47 8K:7.71 16K:7.83 32K:7.63 64K:1.40
>
> From the results, it can be seen that memset has significantly improved its performance with
> a data volume of around 8B, from 0.18 bytes/ns to 0.27 bytes/ns.
>
> The previous work was as follows:
> 1. "[PATCH] riscv: Optimize memset"
> 6d1cbe2e.3c31d.187eb14d990.Coremail.zhangfei@...iscas.ac.cn
Cover letters should have a changelog, in this case a couple phrases
stating what's different in v2 vs. v1.
Thanks,
drew
>
> Thanks,
> Fei Zhang
>
> Andrew Jones (1):
> RISC-V: lib: Improve memset assembler formatting
>
> arch/riscv/lib/memset.S | 143 ++++++++++++++++++++--------------------
> 1 file changed, 72 insertions(+), 71 deletions(-)
>
> zhangfei (1):
> RISC-V: lib: Optimize memset performance
>
> arch/riscv/lib/memset.S | 40 +++++++++++++++++++++++++++++++++++++---
> 1 file changed, 37 insertions(+), 3 deletions(-)
>
Powered by blists - more mailing lists