linux-kernel - Re: [PATCH] riscv: Optimize memset

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230509-b0dc346928ddc8d2b5690f67@orel>
Date:   Tue, 9 May 2023 11:16:33 +0200
From:   Andrew Jones <ajones@...tanamicro.com>
To:     zhangfei <zhang_fei_0403@....com>
Cc:     aou@...s.berkeley.edu, linux-kernel@...r.kernel.org,
        linux-riscv@...ts.infradead.org, palmer@...belt.com,
        paul.walmsley@...ive.com, zhangfei@...iscas.ac.cn
Subject: Re: [PATCH] riscv: Optimize memset

On Tue, May 09, 2023 at 10:22:07AM +0800, zhangfei wrote:
> From: zhangfei <zhangfei@...iscas.ac.cn>
> 
> > >  5:
> > > -	sb a1, 0(t0)
> > > -	addi t0, t0, 1
> > > -	bltu t0, a3, 5b
> > > +        sb a1, 0(t0)
> > > +        sb a1, -1(a3)
> > > +        li a4, 2
> > > +        bgeu a4, a2, 6f
> > > +
> > > +        sb a1, 1(t0)
> > > +        sb a1, 2(t0)
> > > +        sb a1, -2(a3)
> > > +        sb a1, -3(a3)
> > > +        li a4, 6
> > > +        bgeu a4, a2, 6f
> > > +
> > > +        sb a1, 3(t0)
> > > +        sb a1, -4(a3)
> > > +        li a4, 8
> > > +        bgeu a4, a2, 6f
> > 
> > Why is this check here?
> 
> Hi,
> 
> I filled head and tail with minimal branching. Each conditional ensures that 
> all the subsequently used offsets are well-defined and in the dest region.

I know. You trimmed my comment, so I'll quote myself, here

"""
After the check of a2 against 6 above we know that offsets 6(t0)
and -7(a3) are safe. Are we trying to avoid too may redundant
stores with these additional checks?
"""

So, again. Why the additional check against 8 above and, the one you
trimmed, checking 10?

> 
> Although this approach may result in redundant storage, compared to byte by 
> byte storage, it allows storage instructions to be executed in parallel and 
> reduces the number of jumps.

I understood that when I read the code, but text like this should go in
the commit message to avoid people having to think their way through
stuff.

> 
> I used the code linked below for performance testing and commented on the memset 
> that calls the arm architecture in the code to ensure it runs properly on the 
> risc-v platform.
> 
> [1] https://github.com/ARM-software/optimized-routines/blob/master/string/bench/memset.c#L53
> 
> The testing platform selected RISC-V SiFive U74.The test data is as follows:
> 
> Before optimization
> ---------------------
> Random memset (bytes/ns):
>            memset_call 32K:0.45 64K:0.35 128K:0.30 256K:0.28 512K:0.27 1024K:0.25 avg 0.30
> 
> Medium memset (bytes/ns):
>            memset_call 8B:0.18 16B:0.48 32B:0.91 64B:1.63 128B:2.71 256B:4.40 512B:5.67
> Large memset (bytes/ns):
>            memset_call 1K:6.62 2K:7.02 4K:7.46 8K:7.70 16K:7.82 32K:7.63 64K:1.40
> 
> After optimization
> ---------------------
> Random memset bytes/ns):
>            memset_call 32K:0.46 64K:0.35 128K:0.30 256K:0.28 512K:0.27 1024K:0.25 avg 0.31
> Medium memset (bytes/ns )
>            memset_call 8B:0.27 16B:0.48 32B:0.91 64B:1.64 128B:2.71 256B:4.40 512B:5.67
> Large memset (bytes/ns):
>            memset_call 1K:6.62 2K:7.02 4K:7.47 8K:7.71 16K:7.83 32K:7.63 64K:1.40
> 
> From the results, it can be seen that memset has significantly improved its performance with 
> a data volume of around 8B, from 0.18 bytes/ns to 0.27 bytes/ns.

And these benchmark results belong in the cover letter, which this series
is missing.

Thanks,
drew