[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250625191701.GC1703@sol>
Date: Wed, 25 Jun 2025 12:17:01 -0700
From: Eric Biggers <ebiggers@...nel.org>
To: Maxime MERE <maxime.mere@...s.st.com>
Cc: Simon Richter <Simon.Richter@...yros.de>, linux-fscrypt@...r.kernel.org,
linux-crypto@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-mtd@...ts.infradead.org, linux-ext4@...r.kernel.org,
linux-f2fs-devel@...ts.sourceforge.net, ceph-devel@...r.kernel.org
Subject: Re: [PATCH] fscrypt: don't use hardware offload Crypto API drivers
On Wed, Jun 25, 2025 at 06:29:26PM +0200, Maxime MERE wrote:
> Hi,
>
> On 6/25/25 08:32, Eric Biggers wrote:
> > That was the synchronous throughput. However, submitting multiple requests
> > asynchronously (which again, fscrypt doesn't actually do) barely helps.
> > Apparently the STM32 crypto engine has only one hardware queue.
> >
> > I already strongly suspected that these non-inline crypto engines aren't worth
> > using. But I didn't realize they are quite this bad. Even with AES on a
> > Cortex-A7 CPU that lacks AES instructions, the CPU is much faster!
>
> From a performance perspective, using hardware crypto offloads the CPU,
> which is important in real-world applications where the CPU must handle
> multiple tasks. Our processors are often single-core and not the highest
> performing, so hardware acceleration is valuable.
>
> I can show you performance test realized with openSSL (3.2.4) who shows,
> less CPU usage and better performance for large block of data when our
> driver is used (via afalg):
>
> command used: ```openssl speed -evp aes-256-cbc -engine afalg -elapsed```
>
> +--------------------+--------------+-----------------+
> | Block Size (bytes) | AFALG (MB/s) | SW BASED (MB/s) |
> +--------------------+--------------+-----------------+
> | 16 | 0.09 | 9.44 |
> | 64 | 0.34 | 11.43 |
> | 256 | 1.31 | 12.08 |
> | 1024 | 4.96 | 12.27 |
> | 8192 | 18.18 | 12.33 |
> | 16384 | 22.48 | 12.33 |
> +--------------------+--------------+-----------------+
>
> to test CPU usage I've used a monocore stm32mp157f.
> here with afalg, we have an average CPU usage of ~75%, with the sw based
> approach CPU is used at ~100%
>
> Maxime
fscrypt is almost always used with 4096-byte blocks, which in my benchmark took
about 1300 μs each with AES-128-CBC-ESSIV w/ STM32 engine, 264 μs each with
AES-128-CBC-ESSIV w/ CPU, or 77 μs each with Adiantum w/ CPU. The CPU-based
times seem short enough that there isn't much time for another task to be
usefully scheduled while waiting for each block. It's important to consider (a)
driver overhead, (b) scheduling overhead, and (c) the low instructions per
second of this processor in the first place.
By the way, the board I have (STM32MP157F-DK2) is actually multi-core. It seems
this is common among ST's offerings that are intended to run Linux? (Of course,
the microcontrollers that don't run Linux are another story.)
- Eric
Powered by blists - more mailing lists