[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1381b751-18cf-4872-99ec-17b4b629d3ef@bell.net>
Date: Thu, 20 Jun 2024 08:34:10 -0400
From: John David Anglin <dave.anglin@...l.net>
To: Helge Deller <deller@...nel.org>, Herbert Xu
<herbert@...dor.apana.org.au>, linux-crypto@...r.kernel.org,
Ard Biesheuvel <ardb@...nel.org>, linux-parisc@...r.kernel.org
Cc: linux-kernel@...r.kernel.org
Subject: Re: [PATCH] crypto: xor - fix template benchmarking
On 2024-06-19 1:31 p.m., Helge Deller wrote:
> Commit c055e3eae0f1 ("crypto: xor - use ktime for template benchmarking")
> switched from using jiffies to ktime-based performance benchmarking.
>
> This works nicely on machines which have a fine-grained ktime()
> clocksource as e.g. x86 machoines with TSC.
> But other machines, e.g. my 4-way HP PARISC server, don't have such
> fine-grained clocksources, which is why it seems that 800 xor loops
> take zero seconds, which then calculates in the logs as:
>
> xor: measuring software checksum speed
> 8regs : -1018167296 MB/sec
> 8regs_prefetch : -1018167296 MB/sec
> 32regs : -1018167296 MB/sec
> 32regs_prefetch : -1018167296 MB/sec
>
> Fix this with some small modifications to the existing code to improve
> the algorithm to always produce correct results without introducing
> major delays for architectures with a fine-grained ktime()
> clocksource:
> a) Delay start of the timing until ktime() just advanced. On machines
> with a fast ktime() this should be just one additional ktime() call.
> b) Count the number of loops. Run at minimum 800 loops and finish
> earliest when the ktime() counter has progressed.
>
> With that the throughput can now be calculated more accurately under all
> conditions.
>
> Fixes: c055e3eae0f1 ("crypto: xor - use ktime for template benchmarking")
> Signed-off-by: Helge Deller <deller@....de>
You can add my "Tested-by".
I wonder if prefetch versions are implemented correctly on parisc:
[ 29.353868] xor: measuring software checksum speed
[ 29.360030] 8regs : 2266 MB/sec
[ 29.368031] 8regs_prefetch : 2076 MB/sec
[ 29.376031] 32regs : 2259 MB/sec
[ 29.384031] 32regs_prefetch : 2075 MB/sec
[ 29.384080] xor: using function: 8regs (2266 MB/sec)
>
> diff --git a/crypto/xor.c b/crypto/xor.c
> index 8e72e5d5db0d..29b4c0fd89d7 100644
> --- a/crypto/xor.c
> +++ b/crypto/xor.c
> @@ -83,33 +83,29 @@ static void __init
> do_xor_speed(struct xor_block_template *tmpl, void *b1, void *b2)
> {
> int speed;
> - int i, j;
> - ktime_t min, start, diff;
> + unsigned long reps;
> + ktime_t min, start, t0;
>
> tmpl->next = template_list;
> template_list = tmpl;
>
> preempt_disable();
>
> - min = (ktime_t)S64_MAX;
> - for (i = 0; i < 3; i++) {
> - start = ktime_get();
> - for (j = 0; j < REPS; j++) {
> - mb(); /* prevent loop optimization */
> - tmpl->do_2(BENCH_SIZE, b1, b2);
> - mb();
> - }
> - diff = ktime_sub(ktime_get(), start);
> - if (diff < min)
> - min = diff;
> - }
> + t0 = ktime_get();
> + /* delay start until time has advanced */
> + do { start = ktime_get(); } while (start == t0);
> + reps = 0;
> + do {
> + mb(); /* prevent loop optimization */
> + tmpl->do_2(BENCH_SIZE, b1, b2);
> + mb();
> + } while (reps++ < REPS || (t0 = ktime_get()) == start);
> + min = ktime_sub(t0, start);
>
> preempt_enable();
>
> // bytes/ns == GB/s, multiply by 1000 to get MB/s [not MiB/s]
> - if (!min)
> - min = 1;
> - speed = (1000 * REPS * BENCH_SIZE) / (unsigned int)ktime_to_ns(min);
> + speed = (1000 * reps * BENCH_SIZE) / (unsigned int)ktime_to_ns(min);
> tmpl->speed = speed;
>
> pr_info(" %-16s: %5d MB/sec\n", tmpl->name, speed);
>
--
John David Anglin dave.anglin@...l.net
Powered by blists - more mailing lists