[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aW8whGcAR0x6FSRJ@smile.fi.intel.com>
Date: Tue, 20 Jan 2026 09:36:36 +0200
From: Andy Shevchenko <andriy.shevchenko@...el.com>
To: Feng Jiang <jiangfeng@...inos.cn>
Cc: pjw@...nel.org, palmer@...belt.com, aou@...s.berkeley.edu,
alex@...ti.fr, akpm@...ux-foundation.org, kees@...nel.org,
andy@...nel.org, ebiggers@...nel.org, martin.petersen@...cle.com,
ardb@...nel.org, charlie@...osinc.com, conor.dooley@...rochip.com,
ajones@...tanamicro.com, linus.walleij@...aro.org,
nathan@...nel.org, linux-riscv@...ts.infradead.org,
linux-kernel@...r.kernel.org, linux-hardening@...r.kernel.org
Subject: Re: [PATCH v3 0/8] riscv: optimize string functions and add kunit
tests
On Tue, Jan 20, 2026 at 02:58:44PM +0800, Feng Jiang wrote:
> This series provides optimized implementations of strnlen(), strchr(),
> and strrchr() for the RISC-V architecture. The strnlen implementation
> is derived from the existing optimized strlen. For strchr and strrchr,
strchr() and strrchr()
> the current versions use simple byte-by-byte assembly logic, which
> will serve as a baseline for future Zbb-based optimizations.
>
> The patch series is organized into three parts:
> 1. Correctness Testing: The first three patches add KUnit test cases
> for strlen, strnlen, and strrchr to ensure the baseline and optimized
strlen(), strnlen(), and strrchr()
> versions are functionally correct.
> 2. Benchmarking Tool: Patches 4 and 5 extend string_kunit to include
> performance measurement capabilities, allowing for comparative
> analysis within the KUnit environment.
> 3. Architectural Optimizations: The final three patches introduce the
> RISC-V specific assembly implementations.
>
> Following suggestions from Andy Shevchenko, performance benchmarks have
> been added to string_kunit.c to provide quantifiable evidence of the
> improvements. Andy provided many specific comments on the implementation
> of the benchmark logic, which is also inspired by Eric Biggers'
> crc_benchmark(). Performance was measured in a QEMU TCG (rv64) environment,
> comparing the generic C implementation with the new RISC-V assembly versions.
>
> Performance Summary (Improvement %):
> ---------------------------------------------------------------
> Function | 16 B (Short) | 512 B (Mid) | 4096 B (Long)
> ---------------------------------------------------------------
> strnlen | +64.0% | +346.2% | +410.7%
This is still suspicious.
> strchr | +4.0% | +6.4% | +1.5%
> strrchr | +6.6% | +2.8% | +0.0%
> ---------------------------------------------------------------
> The benchmarks can be reproduced by enabling CONFIG_STRING_KUNIT_BENCH
> and running: ./tools/testing/kunit/kunit.py run --arch=riscv \
> --cross_compile=riscv64-linux-gnu- --kunitconfig=my_string.kunitconfig \
> --raw_output
>
> The strnlen implementation leverages the Zbb 'orc.b' instruction and
strnlen()
> word-at-a-time logic, showing significant gains as the string length
> increases.
Hmm... Have you tried to optimise the generic implementation to use
word-at-a-time logic and compare?
> For strchr and strrchr, the handwritten assembly reduces
strchr() and strrchr()
> fixed overhead by eliminating stack frame management. The gain is most
> prominent on short strings (1-16B) where function call overhead dominates,
> while the performance converges with the C implementation for longer
> strings in the TCG environment.
--
With Best Regards,
Andy Shevchenko
Powered by blists - more mailing lists