[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20260121105717.04853c5d@pumpkin>
Date: Wed, 21 Jan 2026 10:57:17 +0000
From: David Laight <david.laight.linux@...il.com>
To: Andy Shevchenko <andy.shevchenko@...il.com>
Cc: Feng Jiang <jiangfeng@...inos.cn>, Andy Shevchenko
<andriy.shevchenko@...el.com>, pjw@...nel.org, palmer@...belt.com,
aou@...s.berkeley.edu, alex@...ti.fr, akpm@...ux-foundation.org,
kees@...nel.org, andy@...nel.org, ebiggers@...nel.org,
martin.petersen@...cle.com, ardb@...nel.org, charlie@...osinc.com,
conor.dooley@...rochip.com, ajones@...tanamicro.com,
linus.walleij@...aro.org, nathan@...nel.org,
linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org,
linux-hardening@...r.kernel.org
Subject: Re: [PATCH v3 0/8] riscv: optimize string functions and add kunit
tests
On Wed, 21 Jan 2026 09:01:29 +0200
Andy Shevchenko <andy.shevchenko@...il.com> wrote:
...
> I understand that. My point is if we move the generic implementation
> to use word-at-a-time technique the difference should not go 4x,
> right? Perhaps 1.5x or so. I believe this will be a very useful
> exercise.
I posted a version earlier.
After the initial setup (aligning the base address and loading
some constants the loop on x86-64 is 7 instructions (should be similar
for other architectures).
I think it will execute in 4 clocks.
You then need to find the byte in the word, easy enough on LE with
a fast ffs() - but harder otherwise.
The real problem is the cost for short strings.
Like memcpy() you need a hint from the source of the 'expected' length
(as a compile-time constant) to compile-time select the algorithm.
OTOH:
for (;;) {
if (!ptr[0]) return ptr - start;
ptr += 2;
while (ptr[-1]);
return ptr - start - 1;
has two 'load+compare+branch' and one add per loop.
On x86 that might all overlap and give you a two-clock loop
that checks one byte every clock - faster than 'rep scasb'.
(You can get a two clock loop, but not a 1 clock loop.)
I think unrolling further will make little/no difference.
The break-even for the word-at-a-time version is probably at least 64
characters.
David
Powered by blists - more mailing lists