lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20230511014243.3336-1-zhang_fei_0403@163.com>
Date:   Thu, 11 May 2023 09:42:43 +0800
From:   zhangfei <zhang_fei_0403@....com>
To:     ajones@...tanamicro.com
Cc:     aou@...s.berkeley.edu, linux-kernel@...r.kernel.org,
        linux-riscv@...ts.infradead.org, palmer@...belt.com,
        paul.walmsley@...ive.com, zhang_fei_0403@....com,
        zhangfei@...iscas.ac.cn
Subject: Re: [PATCH] riscv: Optimize memset 

From: zhangfei <zhangfei@...iscas.ac.cn>

On Wed, May 10, 2023 at 14:58:22PM +0200, Andrew Jones wrote:
> On Wed, May 10, 2023 at 11:52:43AM +0800, zhangfei wrote:
> > From: zhangfei <zhangfei@...iscas.ac.cn>
> > 
> > On Tue, May 09, 2023 11:16:33AM +0200, Andrew Jones wrote: 
> > > On Tue, May 09, 2023 at 10:22:07AM +0800, zhangfei wrote:
> > > > 
> > > > Hi,
> > > > 
> > > > I filled head and tail with minimal branching. Each conditional ensures that 
> > > > all the subsequently used offsets are well-defined and in the dest region.
> > > 
> > > I know. You trimmed my comment, so I'll quote myself, here
> > > 
> > > """
> > > After the check of a2 against 6 above we know that offsets 6(t0)
> > > and -7(a3) are safe. Are we trying to avoid too may redundant
> > > stores with these additional checks?
> > > """
> > > 
> > > So, again. Why the additional check against 8 above and, the one you
> > > trimmed, checking 10?
> > 
> > Hi,
> > 
> > These additional checks are to avoid too many redundant stores. 
> > 
> > Adding a check for more than 8 bytes is because after the loop 
> > segment '3' comes out, the remaining bytes are less than 8 bytes, 
> > which also avoids redundant stores.
> 
> So the benchmarks showed these additional checks were necessary to avoid
> making memset worse? Please add comments to the code explaining the
> purpose of the checks.

Hi,

As you mentioned, the lack of these additional tests can make memset worse. 
When I removed the checks for 8 and 10 above, the benchmarks showed that the 
memset changed to 0.21 bytes/ns at 8B. Although this is better than storing 
byte by byte, additional detections will bring a better improvement to 0.27 bytes/ns.

Due to the chaotic response in my previous email, I am sorry for this. I have 
reorganized patch v2 and sent it to you. Please reply under the latest patch.

Thanks,
Fei Zhang

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ