linux-kernel - Re: [PATCH v2 08/14] lib/string_kunit: add performance benchmark for strlen()

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20260114102154.251082c6@pumpkin>
Date: Wed, 14 Jan 2026 10:21:54 +0000
From: David Laight <david.laight.linux@...il.com>
To: Feng Jiang <jiangfeng@...inos.cn>
Cc: Andy Shevchenko <andriy.shevchenko@...el.com>, pjw@...nel.org,
 palmer@...belt.com, aou@...s.berkeley.edu, alex@...ti.fr, kees@...nel.org,
 andy@...nel.org, akpm@...ux-foundation.org, ebiggers@...nel.org,
 martin.petersen@...cle.com, ardb@...nel.org, ajones@...tanamicro.com,
 conor.dooley@...rochip.com, samuel.holland@...ive.com,
 linus.walleij@...aro.org, nathan@...nel.org,
 linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org,
 linux-hardening@...r.kernel.org
Subject: Re: [PATCH v2 08/14] lib/string_kunit: add performance benchmark
 for strlen()

On Wed, 14 Jan 2026 15:04:58 +0800
Feng Jiang <jiangfeng@...inos.cn> wrote:

> On 2026/1/14 14:14, Feng Jiang wrote:
> > On 2026/1/13 16:46, Andy Shevchenko wrote:  
> >> On Tue, Jan 13, 2026 at 04:27:42PM +0800, Feng Jiang wrote:  
> >>> Introduce a benchmark to compare the architecture-optimized strlen()
> >>> implementation against the generic C version (__generic_strlen).
> >>>
> >>> The benchmark uses a table-driven approach to evaluate performance
> >>> across different string lengths (short, medium, and long). It employs
> >>> ktime_get() for timing and get_random_bytes() followed by null-byte
> >>> filtering to generate test data that prevents early termination.
> >>>
> >>> This helps in quantifying the performance gains of architecture-specific
> >>> optimizations on various platforms.  
...
> Preliminary results with this change look much more reasonable:
> 
>     ok 4 string_test_strlen
>     # string_test_strlen_bench: strlen performance (short, len: 8, iters: 100000):
>     # string_test_strlen_bench:   arch-optimized: 4767500 ns
>     # string_test_strlen_bench:   generic C:      5815800 ns
>     # string_test_strlen_bench:   speedup:        1.21x
>     # string_test_strlen_bench: strlen performance (medium, len: 64, iters: 100000):
>     # string_test_strlen_bench:   arch-optimized: 6573600 ns
>     # string_test_strlen_bench:   generic C:      16342500 ns
>     # string_test_strlen_bench:   speedup:        2.48x
>     # string_test_strlen_bench: strlen performance (long, len: 2048, iters: 10000):
>     # string_test_strlen_bench:   arch-optimized: 7931000 ns
>     # string_test_strlen_bench:   generic C:      35347300 ns

That is far too long.
In 35ms you are including a lot of timer interrupts.
You are also just testing the 'hot cache' case.
The kernel runs 'cold cache' a lot of the time - especially for instructions.

To time short loops (or even single passes) you need a data dependency
between the 'start time' and the code being tested (easy enough, just add
(time & non_compile_time_zero) to a parameter), and between the result of
the code and the 'end time' - somewhat harder (doable in x86 if you use
the pmc cycle counter).

	David


>     # string_test_strlen_bench:   speedup:        4.45x
>     ok 5 string_test_strlen_bench
> 
> I will adopt this pattern in v3, along with cache warm-up and preempt_disable(),
> to stay consistent with existing kernel benchmarks and ensure robust measurements.
>