lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4cd1d9d1-34da-7d96-46ac-a5470cfa85c3@kernel.org>
Date: Sat, 24 Jan 2026 01:14:58 -0700 (MST)
From: Paul Walmsley <pjw@...nel.org>
To: Feng Jiang <jiangfeng@...inos.cn>
cc: Paul Walmsley <pjw@...nel.org>, palmer@...belt.com, aou@...s.berkeley.edu, 
    alex@...ti.fr, samuel.holland@...ive.com, charlie@...osinc.com, 
    conor.dooley@...rochip.com, linux-riscv@...ts.infradead.org, 
    linux-kernel@...r.kernel.org
Subject: Re: [PATCH] riscv: lib: optimize strlen loop efficiency

On Thu, 15 Jan 2026, Feng Jiang wrote:

> On 2026/1/15 10:03, Paul Walmsley wrote:
> > On Thu, 18 Dec 2025, Feng Jiang wrote:
> > 
> >> Optimize the generic strlen implementation by using a pre-decrement
> >> pointer. This reduces the loop body from 4 instructions to 3 and
> >> eliminates the unconditional jump ('j').
> >>
> >> Old loop (4 instructions, 2 branches):
> >>   1: lbu t0, 0(t1); beqz t0, 2f; addi t1, t1, 1; j 1b
> >>
> >> New loop (3 instructions, 1 branch):
> >>   1: addi t1, t1, 1; lbu t0, 0(t1); bnez t0, 1b
> >>
> >> This change improves execution efficiency and reduces branch pressure
> >> for systems without the Zbb extension.
> > 
> > Looks reasonable; do you have any benchmarks on hardware that you can 
> > share?  Any reason why this patch stands alone and isn't rolled up as part 
> > of your "optimize string function" series?
> 
> Thanks for the feedback.
> 
> This patch predates the rest of the series, which is why it wasn't included
> in the 'optimize string function' rollup. At the time, I focused on correctness
> testing and observed the improvement through rdcycle instruction counts.
> 
> Since the series still needs further refinement and may take a longer time to
> complete, I was hoping this standalone optimization could be considered independently.

Ok.  Queued for v6.20.

Might be worth taking a look at David's suggestions for a followup patch?


- Paul

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ