linux-kernel - Re: [PATCH] crypto: xor - fix template benchmarking

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1381b751-18cf-4872-99ec-17b4b629d3ef@bell.net>
Date: Thu, 20 Jun 2024 08:34:10 -0400
From: John David Anglin <dave.anglin@...l.net>
To: Helge Deller <deller@...nel.org>, Herbert Xu
 <herbert@...dor.apana.org.au>, linux-crypto@...r.kernel.org,
 Ard Biesheuvel <ardb@...nel.org>, linux-parisc@...r.kernel.org
Cc: linux-kernel@...r.kernel.org
Subject: Re: [PATCH] crypto: xor - fix template benchmarking

On 2024-06-19 1:31 p.m., Helge Deller wrote:
> Commit c055e3eae0f1 ("crypto: xor - use ktime for template benchmarking")
> switched from using jiffies to ktime-based performance benchmarking.
>
> This works nicely on machines which have a fine-grained ktime()
> clocksource as e.g. x86 machoines with TSC.
> But other machines, e.g. my 4-way HP PARISC server, don't have such
> fine-grained clocksources, which is why it seems that 800 xor loops
> take zero seconds, which then calculates in the logs as:
>
>   xor: measuring software checksum speed
>      8regs           : -1018167296 MB/sec
>      8regs_prefetch  : -1018167296 MB/sec
>      32regs          : -1018167296 MB/sec
>      32regs_prefetch : -1018167296 MB/sec
>
> Fix this with some small modifications to the existing code to improve
> the algorithm to always produce correct results without introducing
> major delays for architectures with a fine-grained ktime()
> clocksource:
> a) Delay start of the timing until ktime() just advanced. On machines
> with a fast ktime() this should be just one additional ktime() call.
> b) Count the number of loops. Run at minimum 800 loops and finish
> earliest when the ktime() counter has progressed.
>
> With that the throughput can now be calculated more accurately under all
> conditions.
>
> Fixes: c055e3eae0f1 ("crypto: xor - use ktime for template benchmarking")
> Signed-off-by: Helge Deller <deller@....de>
You can add my "Tested-by".

I wonder if prefetch versions are implemented correctly on parisc:

[   29.353868] xor: measuring software checksum speed
[   29.360030]    8regs           :  2266 MB/sec
[   29.368031]    8regs_prefetch  :  2076 MB/sec
[   29.376031]    32regs          :  2259 MB/sec
[   29.384031]    32regs_prefetch :  2075 MB/sec
[   29.384080] xor: using function: 8regs (2266 MB/sec)

>
> diff --git a/crypto/xor.c b/crypto/xor.c
> index 8e72e5d5db0d..29b4c0fd89d7 100644
> --- a/crypto/xor.c
> +++ b/crypto/xor.c
> @@ -83,33 +83,29 @@ static void __init
>   do_xor_speed(struct xor_block_template *tmpl, void *b1, void *b2)
>   {
>   	int speed;
> -	int i, j;
> -	ktime_t min, start, diff;
> +	unsigned long reps;
> +	ktime_t min, start, t0;
>   
>   	tmpl->next = template_list;
>   	template_list = tmpl;
>   
>   	preempt_disable();
>   
> -	min = (ktime_t)S64_MAX;
> -	for (i = 0; i < 3; i++) {
> -		start = ktime_get();
> -		for (j = 0; j < REPS; j++) {
> -			mb(); /* prevent loop optimization */
> -			tmpl->do_2(BENCH_SIZE, b1, b2);
> -			mb();
> -		}
> -		diff = ktime_sub(ktime_get(), start);
> -		if (diff < min)
> -			min = diff;
> -	}
> +	t0 = ktime_get();
> +	/* delay start until time has advanced */
> +	do { start = ktime_get(); } while (start == t0);
> +	reps = 0;
> +	do {
> +		mb(); /* prevent loop optimization */
> +		tmpl->do_2(BENCH_SIZE, b1, b2);
> +		mb();
> +	} while (reps++ < REPS || (t0 = ktime_get()) == start);
> +	min = ktime_sub(t0, start);
>   
>   	preempt_enable();
>   
>   	// bytes/ns == GB/s, multiply by 1000 to get MB/s [not MiB/s]
> -	if (!min)
> -		min = 1;
> -	speed = (1000 * REPS * BENCH_SIZE) / (unsigned int)ktime_to_ns(min);
> +	speed = (1000 * reps * BENCH_SIZE) / (unsigned int)ktime_to_ns(min);
>   	tmpl->speed = speed;
>   
>   	pr_info("   %-16s: %5d MB/sec\n", tmpl->name, speed);
>


-- 
John David Anglin  dave.anglin@...l.net