linux-kernel - Re: [PATCH] riscv: lib: optimize strlen loop efficiency

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <9ea5656b-2520-4057-9e4c-e4a8db947598@kylinos.cn>
Date: Mon, 26 Jan 2026 11:05:06 +0800
From: Feng Jiang <jiangfeng@...inos.cn>
To: Paul Walmsley <pjw@...nel.org>
Cc: palmer@...belt.com, aou@...s.berkeley.edu, alex@...ti.fr,
 samuel.holland@...ive.com, charlie@...osinc.com, conor.dooley@...rochip.com,
 linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] riscv: lib: optimize strlen loop efficiency

On 2026/1/24 16:14, Paul Walmsley wrote:
> On Thu, 15 Jan 2026, Feng Jiang wrote:
> 
>> On 2026/1/15 10:03, Paul Walmsley wrote:
>>> On Thu, 18 Dec 2025, Feng Jiang wrote:
>>>
>>>> Optimize the generic strlen implementation by using a pre-decrement
>>>> pointer. This reduces the loop body from 4 instructions to 3 and
>>>> eliminates the unconditional jump ('j').
>>>>
>>>> Old loop (4 instructions, 2 branches):
>>>>   1: lbu t0, 0(t1); beqz t0, 2f; addi t1, t1, 1; j 1b
>>>>
>>>> New loop (3 instructions, 1 branch):
>>>>   1: addi t1, t1, 1; lbu t0, 0(t1); bnez t0, 1b
>>>>
>>>> This change improves execution efficiency and reduces branch pressure
>>>> for systems without the Zbb extension.
>>>
>>> Looks reasonable; do you have any benchmarks on hardware that you can 
>>> share?  Any reason why this patch stands alone and isn't rolled up as part 
>>> of your "optimize string function" series?
>>
>> Thanks for the feedback.
>>
>> This patch predates the rest of the series, which is why it wasn't included
>> in the 'optimize string function' rollup. At the time, I focused on correctness
>> testing and observed the improvement through rdcycle instruction counts.
>>
>> Since the series still needs further refinement and may take a longer time to
>> complete, I was hoping this standalone optimization could be considered independently.
> 
> Ok.  Queued for v6.20.
> 
> Might be worth taking a look at David's suggestions for a followup patch?
> 

Thanks for queuing this!

I am definitely planning to study David's suggestions. He has also provided a lot
of valuable feedback on my other patch series, and I will explore further improvements
for a follow-up patch.

-- 
With Best Regards,
Feng Jiang