linux-kernel - Re: [PATCH] riscv: lib: optimize strlen loop efficiency

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20260115111947.54929ed0@pumpkin>
Date: Thu, 15 Jan 2026 11:19:47 +0000
From: David Laight <david.laight.linux@...il.com>
To: Paul Walmsley <pjw@...nel.org>
Cc: Feng Jiang <jiangfeng@...inos.cn>, palmer@...belt.com,
 aou@...s.berkeley.edu, alex@...ti.fr, samuel.holland@...ive.com,
 charlie@...osinc.com, conor.dooley@...rochip.com,
 linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] riscv: lib: optimize strlen loop efficiency

On Wed, 14 Jan 2026 19:03:17 -0700 (MST)
Paul Walmsley <pjw@...nel.org> wrote:

> On Thu, 18 Dec 2025, Feng Jiang wrote:
> 
> > Optimize the generic strlen implementation by using a pre-decrement
> > pointer. This reduces the loop body from 4 instructions to 3 and
> > eliminates the unconditional jump ('j').
> > 
> > Old loop (4 instructions, 2 branches):
> >   1: lbu t0, 0(t1); beqz t0, 2f; addi t1, t1, 1; j 1b
> > 
> > New loop (3 instructions, 1 branch):
> >   1: addi t1, t1, 1; lbu t0, 0(t1); bnez t0, 1b

Is that a change to the generic C code?
Testing (++sc)[-1] might do the trick without requiring the extra read
of the first location.

> > 
> > This change improves execution efficiency and reduces branch pressure
> > for systems without the Zbb extension.
> 
> Looks reasonable; do you have any benchmarks on hardware that you can 
> share?  Any reason why this patch stands alone and isn't rolled up as part 
> of your "optimize string function" series?

For 64bit you can do a lot better (in C) by loading 64bit words and doing
the correct 'shift and mask' sequence to detect a zero byte.
It usually isn't worth in for 32bit.

Does need to handle a mis-aligned base - eg by masking the bits off
the base pointer and or'ing in non-zero values to the value read from
the base pointer.

	David