[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20230511014243.3336-1-zhang_fei_0403@163.com>
Date: Thu, 11 May 2023 09:42:43 +0800
From: zhangfei <zhang_fei_0403@....com>
To: ajones@...tanamicro.com
Cc: aou@...s.berkeley.edu, linux-kernel@...r.kernel.org,
linux-riscv@...ts.infradead.org, palmer@...belt.com,
paul.walmsley@...ive.com, zhang_fei_0403@....com,
zhangfei@...iscas.ac.cn
Subject: Re: [PATCH] riscv: Optimize memset
From: zhangfei <zhangfei@...iscas.ac.cn>
On Wed, May 10, 2023 at 14:58:22PM +0200, Andrew Jones wrote:
> On Wed, May 10, 2023 at 11:52:43AM +0800, zhangfei wrote:
> > From: zhangfei <zhangfei@...iscas.ac.cn>
> >
> > On Tue, May 09, 2023 11:16:33AM +0200, Andrew Jones wrote:
> > > On Tue, May 09, 2023 at 10:22:07AM +0800, zhangfei wrote:
> > > >
> > > > Hi,
> > > >
> > > > I filled head and tail with minimal branching. Each conditional ensures that
> > > > all the subsequently used offsets are well-defined and in the dest region.
> > >
> > > I know. You trimmed my comment, so I'll quote myself, here
> > >
> > > """
> > > After the check of a2 against 6 above we know that offsets 6(t0)
> > > and -7(a3) are safe. Are we trying to avoid too may redundant
> > > stores with these additional checks?
> > > """
> > >
> > > So, again. Why the additional check against 8 above and, the one you
> > > trimmed, checking 10?
> >
> > Hi,
> >
> > These additional checks are to avoid too many redundant stores.
> >
> > Adding a check for more than 8 bytes is because after the loop
> > segment '3' comes out, the remaining bytes are less than 8 bytes,
> > which also avoids redundant stores.
>
> So the benchmarks showed these additional checks were necessary to avoid
> making memset worse? Please add comments to the code explaining the
> purpose of the checks.
Hi,
As you mentioned, the lack of these additional tests can make memset worse.
When I removed the checks for 8 and 10 above, the benchmarks showed that the
memset changed to 0.21 bytes/ns at 8B. Although this is better than storing
byte by byte, additional detections will bring a better improvement to 0.27 bytes/ns.
Due to the chaotic response in my previous email, I am sorry for this. I have
reorganized patch v2 and sent it to you. Please reply under the latest patch.
Thanks,
Fei Zhang
Powered by blists - more mailing lists