linux-kernel - Re: [PATCH v2 0/2] RISC-V: Optimize memset for data sizes less than 16 bytes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230511-75718c538818fb3e1d924f9a@orel>
Date:   Thu, 11 May 2023 09:44:39 +0200
From:   Andrew Jones <ajones@...tanamicro.com>
To:     zhangfei <zhang_fei_0403@....com>
Cc:     linux-kernel@...r.kernel.org, linux-riscv@...ts.infradead.org,
        aou@...s.berkeley.edu, palmer@...belt.com,
        paul.walmsley@...ive.com, conor.dooley@...rochip.com,
        zhangfei@...iscas.ac.cn
Subject: Re: [PATCH v2 0/2] RISC-V: Optimize memset for data sizes less than
 16 bytes

On Thu, May 11, 2023 at 09:26:04AM +0800, zhangfei wrote:
> From: zhangfei <zhangfei@...iscas.ac.cn>
> 
> At present, the implementation of the memset function uses byte by byte storage 
> when processing tail data or when the initial data size is less than 16 bytes. 
> This approach is not efficient. Therefore, I filled head and tail with minimal 
> branching. Each conditional ensures that all the subsequently used offsets are 
> well-defined and in the dest region. Although this approach may result in 
> redundant storage, compared to byte by byte storage, it allows storage instructions 
> to be executed in parallel, reduces the number of jumps, and ultimately achieves 
> performance improvement.
> 
> I used the code linked below for performance testing and commented on the memset 
> that calls the arm architecture in the code to ensure it runs properly on the 
> risc-v platform.
> 
> [1] https://github.com/ARM-software/optimized-routines/blob/master/string/bench/memset.c#L53
> 
> The testing platform selected RISC-V SiFive U74.The test data is as follows:
> 
> Before optimization
> ---------------------
> Random memset (bytes/ns):
>            memset_call 32K:0.45 64K:0.35 128K:0.30 256K:0.28 512K:0.27 1024K:0.25 avg 0.30
> 
> Medium memset (bytes/ns):
>            memset_call 8B:0.18 16B:0.48 32B:0.91 64B:1.63 128B:2.71 256B:4.40 512B:5.67
> Large memset (bytes/ns):
>            memset_call 1K:6.62 2K:7.02 4K:7.46 8K:7.70 16K:7.82 32K:7.63 64K:1.40
> 
> After optimization
> ---------------------
> Random memset bytes/ns):
>            memset_call 32K:0.46 64K:0.35 128K:0.30 256K:0.28 512K:0.27 1024K:0.25 avg 0.31
> Medium memset (bytes/ns )
>            memset_call 8B:0.27 16B:0.48 32B:0.91 64B:1.64 128B:2.71 256B:4.40 512B:5.67
> Large memset (bytes/ns):
>            memset_call 1K:6.62 2K:7.02 4K:7.47 8K:7.71 16K:7.83 32K:7.63 64K:1.40
> 
> From the results, it can be seen that memset has significantly improved its performance with 
> a data volume of around 8B, from 0.18 bytes/ns to 0.27 bytes/ns.
> 
> The previous work was as follows:
> 1. "[PATCH] riscv: Optimize memset"
>    6d1cbe2e.3c31d.187eb14d990.Coremail.zhangfei@...iscas.ac.cn

Cover letters should have a changelog, in this case a couple phrases
stating what's different in v2 vs. v1.

Thanks,
drew

> 
> Thanks,
> Fei Zhang
> 
> Andrew Jones (1):
>   RISC-V: lib: Improve memset assembler formatting
> 
>  arch/riscv/lib/memset.S | 143 ++++++++++++++++++++--------------------
>  1 file changed, 72 insertions(+), 71 deletions(-)
> 
> zhangfei (1):
>   RISC-V: lib: Optimize memset performance
> 
>  arch/riscv/lib/memset.S | 40 +++++++++++++++++++++++++++++++++++++---
>  1 file changed, 37 insertions(+), 3 deletions(-)
>