[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aedfebcb-4bca-4474-a590-b1acc37307ac@linux.ibm.com>
Date: Fri, 16 Jan 2026 21:55:04 +0100
From: Holger Dengler <dengler@...ux.ibm.com>
To: Eric Biggers <ebiggers@...nel.org>
Cc: David Laight <david.laight.linux@...il.com>,
Ard Biesheuvel <ardb@...nel.org>,
"Jason A . Donenfeld" <Jason@...c4.com>,
Herbert Xu <herbert@...dor.apana.org.au>,
Harald Freudenberger <freude@...ux.ibm.com>,
linux-kernel@...r.kernel.org, linux-crypto@...r.kernel.org
Subject: Re: [PATCH v1 1/1] lib/crypto: tests: Add KUnit tests for AES
On 16/01/2026 20:44, Eric Biggers wrote:
> On Fri, Jan 16, 2026 at 08:20:51PM +0100, Holger Dengler wrote:
>>>> The benchmark loops for 100 iterations now without any warm-up. In each
>>>> iteration, I measure a single aes_encrypt()/aes_decrypt() call. The lowest
>>>> value of these measurements is takes as the value for the bandwidth
>>>> calculations. Although it is not necessary in my environment, I'm doing all
>>>> iterations with preemption disabled. I think, that this might help on other
>>>> platforms to reduce the jitter of the measurement values.
>>>>
>>>> The removal of the warm-up does not have any impact on the numbers.
>>>
>>> I'm not sure what the 'warm-up' was for.
>>> The first test will be slow(er) due to I-cache misses.
>>> (That will be more noticeable for big software loops - like blake2.)
>>> Change to test parameters can affect branch prediction but that also only
>>> usually affects the first test with each set of parameters.
>>> (Unlikely to affect AES, but I could see that effect when testing
>>> mul_u64_u64_div_u64().)
>>> The only other reason for a 'warm-up' is to get the cpu frequency fast
>>> and fixed - and there ought to be a better way of doing that.
>
> The warm-up loops in the existing benchmarks are both for cache warming
> and to get the CPU frequency fast and fixed. It's not anything
> sophisticated, but rather just something that's simple and seems to
> works well enough across CPUs without depending on any special APIs. If
> your CPU doesn't do much frequency scaling, you may not notice a
> difference, but other CPUs may need it.
Do you have a gut feeling how many iterations it takes to get the CPU speed
up? If it takes less than 50 iterations, it would be sufficient with the new
method.
>>>> I also did some tests with IRQs disabled (instead of only preemption), but the
>>>> numbers stay the same. So I think, it is save enough to stay with disables
>>>> preemption.
>>>
>>> I'd actually go for disabling interrupts.
>>> What you are seeing is the effect of interrupts not happening
>>> (which is likely for a short test, but not for a long one).
>>
>> Ok, I'll send the next series with IRQ disabled. I don't see any difference on
>> my systems.
>
> Some architectures don't allow vector registers to be used when IRQs are
> disabled. On those architectures, disabling IRQs would always trigger
> the fallback to the generic code, which would make the benchmark not
> very useful. That's why I've only been disabling preemption, not IRQs.
Ok, this is a very strong argument against disabling IRQs.
--
Mit freundlichen Grüßen / Kind regards
Holger Dengler
--
IBM Systems, Linux on IBM Z Development
dengler@...ux.ibm.com
Powered by blists - more mailing lists