linux-hardening - Re: [PATCH v2 08/14] lib/string_kunit: add performance benchmark for strlen()

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aWdD3N_jwnt_ncc1@smile.fi.intel.com>
Date: Wed, 14 Jan 2026 09:21:00 +0200
From: Andy Shevchenko <andriy.shevchenko@...el.com>
To: Feng Jiang <jiangfeng@...inos.cn>
Cc: pjw@...nel.org, palmer@...belt.com, aou@...s.berkeley.edu,
	alex@...ti.fr, kees@...nel.org, andy@...nel.org,
	akpm@...ux-foundation.org, ebiggers@...nel.org,
	martin.petersen@...cle.com, ardb@...nel.org,
	ajones@...tanamicro.com, conor.dooley@...rochip.com,
	samuel.holland@...ive.com, linus.walleij@...aro.org,
	nathan@...nel.org, linux-riscv@...ts.infradead.org,
	linux-kernel@...r.kernel.org, linux-hardening@...r.kernel.org
Subject: Re: [PATCH v2 08/14] lib/string_kunit: add performance benchmark for
 strlen()

On Wed, Jan 14, 2026 at 03:04:58PM +0800, Feng Jiang wrote:
> On 2026/1/14 14:14, Feng Jiang wrote:
> > On 2026/1/13 16:46, Andy Shevchenko wrote:

...

> > Thank you for the catch. You are absolutely correct—the 2500x figure is heavily
> > distorted and does not reflect real-world performance.
> > 
> > I've found that by using a volatile function pointer to call the implementations
> > (instead of direct calls), the results returned to a realistic range. It appears
> > the previous benchmark logic allowed the compiler to over-optimize the test loop
> > in ways that skewed the data.
> > 
> > I will refactor the benchmark logic in v3, specifically referencing the crc32
> > KUnit implementation (e.g., using warm-up loops and adding preempt_disable()
> > to eliminate context-switch interference) to ensure the data is robust and accurate.
> > 
> 
> Just a quick follow-up: I've also verified that using a volatile variable to store
> the return value (as seen in crc_benchmark()) is equally effective at preventing
> the optimization.
> 
> The core change is as follows:
> 
>     volatile size_t len;
>     ...
>     for (unsigned int j = 0; j < iters; j++) {
>         OPTIMIZER_HIDE_VAR(buf);
>         len = strlen(buf);

But please, check for sure this is Linux kernel generic implementation (before)
and not __builtin_strlen() from GCC. (OTOH, it would be nice to benchmark that
one as well, although I think that __builtin_strlen() in general maybe slightly
better choice than Linux kernel generic implementation.) I.o.w. be sure *what*
you test.

>     }

Or using WRITE_ONCE() :-) But that one will probably be confusing as it usually
should be paired with READ_ONCE() somewhere else in the code. So, I agree on
crc_benchmark() approach taken.

> Preliminary results with this change look much more reasonable:
> 
>     ok 4 string_test_strlen
>     # string_test_strlen_bench: strlen performance (short, len: 8, iters: 100000):
>     # string_test_strlen_bench:   arch-optimized: 4767500 ns
>     # string_test_strlen_bench:   generic C:      5815800 ns
>     # string_test_strlen_bench:   speedup:        1.21x
>     # string_test_strlen_bench: strlen performance (medium, len: 64, iters: 100000):
>     # string_test_strlen_bench:   arch-optimized: 6573600 ns
>     # string_test_strlen_bench:   generic C:      16342500 ns
>     # string_test_strlen_bench:   speedup:        2.48x
>     # string_test_strlen_bench: strlen performance (long, len: 2048, iters: 10000):
>     # string_test_strlen_bench:   arch-optimized: 7931000 ns
>     # string_test_strlen_bench:   generic C:      35347300 ns
>     # string_test_strlen_bench:   speedup:        4.45x
>     ok 5 string_test_strlen_bench
> 
> I will adopt this pattern in v3, along with cache warm-up and preempt_disable(),
> to stay consistent with existing kernel benchmarks and ensure robust measurements.

-- 
With Best Regards,
Andy Shevchenko