[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAMj1kXH--PGu1C64TRas0uhGPd-k3TV+zcd0pounq0nL+5006g@mail.gmail.com>
Date: Fri, 4 Jul 2025 15:25:32 +0200
From: Ard Biesheuvel <ardb@...nel.org>
To: Eric Biggers <ebiggers@...nel.org>
Cc: linux-fscrypt@...r.kernel.org, linux-crypto@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-mtd@...ts.infradead.org,
linux-ext4@...r.kernel.org, linux-f2fs-devel@...ts.sourceforge.net,
ceph-devel@...r.kernel.org, stable@...r.kernel.org
Subject: Re: [PATCH v3] fscrypt: Don't use problematic non-inline crypto engines
On Fri, 4 Jul 2025 at 09:05, Eric Biggers <ebiggers@...nel.org> wrote:
>
> Make fscrypt no longer use Crypto API drivers for non-inline crypto
> engines, even when the Crypto API prioritizes them over CPU-based code
> (which unfortunately it often does). These drivers tend to be really
> problematic, especially for fscrypt's workload. This commit has no
> effect on inline crypto engines, which are different and do work well.
>
> Specifically, exclude drivers that have CRYPTO_ALG_KERN_DRIVER_ONLY or
> CRYPTO_ALG_ALLOCATES_MEMORY set. (Later, CRYPTO_ALG_ASYNC should be
> excluded too. That's omitted for now to keep this commit backportable,
> since until recently some CPU-based code had CRYPTO_ALG_ASYNC set.)
>
> There are two major issues with these drivers: bugs and performance.
>
> First, these drivers tend to be buggy. They're fundamentally much more
> error-prone and harder to test than the CPU-based code. They often
> don't get tested before kernel releases, and even if they do, the crypto
> self-tests don't properly test these drivers. Released drivers have
> en/decrypted or hashed data incorrectly. These bugs cause issues for
> fscrypt users who often didn't even want to use these drivers, e.g.:
>
> - https://github.com/google/fscryptctl/issues/32
> - https://github.com/google/fscryptctl/issues/9
> - https://lore.kernel.org/r/PH0PR02MB731916ECDB6C613665863B6CFFAA2@PH0PR02MB7319.namprd02.prod.outlook.com
>
> These drivers have also similarly caused issues for dm-crypt users,
> including data corruption and deadlocks. Since Linux v5.10, dm-crypt
> has disabled most of them by excluding CRYPTO_ALG_ALLOCATES_MEMORY.
>
> Second, these drivers tend to be *much* slower than the CPU-based code.
> This may seem counterintuitive, but benchmarks clearly show it. There's
> a *lot* of overhead associated with going to a hardware driver, off the
> CPU, and back again. To prove this, I gathered as many systems with
> this type of crypto engine as I could, and I measured synchronous
> encryption of 4096-byte messages (which matches fscrypt's workload):
>
> Intel Emerald Rapids server:
> AES-256-XTS:
> xts-aes-vaes-avx512 16171 MB/s [CPU-based, Vector AES]
> qat_aes_xts 289 MB/s [Offload, Intel QuickAssist]
>
> Qualcomm SM8650 HDK:
> AES-256-XTS:
> xts-aes-ce 4301 MB/s [CPU-based, ARMv8 Crypto Extensions]
> xts-aes-qce 73 MB/s [Offload, Qualcomm Crypto Engine]
>
> i.MX 8M Nano LPDDR4 EVK:
> AES-256-XTS:
> xts-aes-ce 647 MB/s [CPU-based, ARMv8 Crypto Extensions]
> xts(ecb-aes-caam) 20 MB/s [Offload, CAAM]
> AES-128-CBC-ESSIV:
> essiv(cbc-aes-caam,sha256-lib) 23 MB/s [Offload, CAAM]
>
> STM32MP157F-DK2:
> AES-256-XTS:
> xts-aes-neonbs 13.2 MB/s [CPU-based, ARM NEON]
> xts(stm32-ecb-aes) 3.1 MB/s [Offload, STM32 crypto engine]
> AES-128-CBC-ESSIV:
> essiv(cbc-aes-neonbs,sha256-lib)
> 14.7 MB/s [CPU-based, ARM NEON]
> essiv(stm32-cbc-aes,sha256-lib)
> 3.2 MB/s [Offload, STM32 crypto engine]
> Adiantum:
> adiantum(xchacha12-arm,aes-arm,nhpoly1305-neon)
> 52.8 MB/s [CPU-based, ARM scalar + NEON]
>
> So, there was no case in which the crypto engine was even *close* to
> being faster. On the first three, which have AES instructions in the
> CPU, the CPU was 30 to 55 times faster (!). Even on STM32MP157F-DK2
> which has a Cortex-A7 CPU that doesn't have AES instructions, AES was
> over 4 times faster on the CPU. And Adiantum encryption, which is what
> actually should be used on CPUs like that, was over 17 times faster.
>
> Other justifications that have been given for these non-inline crypto
> engines (almost always coming from the hardware vendors, not actual
> users) don't seem very plausible either:
>
> - The crypto engine throughput could be improved by processing
> multiple requests concurrently. Currently irrelevant to fscrypt,
> since it doesn't do that. This would also be complex, and unhelpful
> in many cases. 2 of the 4 engines I tested even had only one queue.
>
> - Some of the engines, e.g. STM32, support hardware keys. Also
> currently irrelevant to fscrypt, since it doesn't support these.
> Interestingly, the STM32 driver itself doesn't support this either.
>
> - Free up CPU for other tasks and/or reduce energy usage. Not very
> plausible considering the "short" message length, driver overhead,
> and scheduling overhead. There's just very little time for the CPU
> to do something else like run another task or enter low-power state,
> before the message finishes and it's time to process the next one.
>
> - Some of these engines resist power analysis and electromagnetic
> attacks, while the CPU-based crypto generally does not. In theory,
> this sounds great. In practice, if this benefit requires the use of
> an off-CPU offload that massively regresses performance and has a
> low-quality, buggy driver, the price for this hardening (which is
> not relevant to most fscrypt users, and tends to be incomplete) is
> just too high. Inline crypto engines are much more promising here,
> as are on-CPU solutions like RISC-V High Assurance Cryptography.
>
> Fixes: b30ab0e03407 ("ext4 crypto: add ext4 encryption facilities")
> Cc: stable@...r.kernel.org
> Signed-off-by: Eric Biggers <ebiggers@...nel.org>
Acked-by: Ard Biesheuvel <ardb@...nel.org>
> ---
>
> Changed in v3:
> - Further improved the commit message and comment. Added data for
> STM32MP157F-DK2 and i.MX 8M Nano LPDDR4 EVK.
> - Updated fscrypt.rst
>
> Changed in v2:
> - Improved commit message and comment
> - Dropped CRYPTO_ALG_ASYNC from the mask, to make this patch
> backport-friendly
> - Added Fixes and Cc stable
>
> Documentation/filesystems/fscrypt.rst | 37 +++++++++++----------------
> fs/crypto/fscrypt_private.h | 17 ++++++++++++
> fs/crypto/hkdf.c | 2 +-
> fs/crypto/keysetup.c | 3 ++-
> fs/crypto/keysetup_v1.c | 3 ++-
> 5 files changed, 37 insertions(+), 25 deletions(-)
>
> diff --git a/Documentation/filesystems/fscrypt.rst b/Documentation/filesystems/fscrypt.rst
> index 29e84d125e02..4a3e844b790c 100644
> --- a/Documentation/filesystems/fscrypt.rst
> +++ b/Documentation/filesystems/fscrypt.rst
> @@ -145,13 +145,12 @@ However, these ioctls have some limitations:
> caches are freed but not wiped. Therefore, portions thereof may be
> recoverable from freed memory, even after the corresponding key(s)
> were wiped. To partially solve this, you can add init_on_free=1 to
> your kernel command line. However, this has a performance cost.
>
> -- Secret keys might still exist in CPU registers, in crypto
> - accelerator hardware (if used by the crypto API to implement any of
> - the algorithms), or in other places not explicitly considered here.
> +- Secret keys might still exist in CPU registers or in other places
> + not explicitly considered here.
>
> Full system compromise
> ~~~~~~~~~~~~~~~~~~~~~~
>
> An attacker who gains "root" access and/or the ability to execute
> @@ -404,13 +403,16 @@ of hardware acceleration for AES. Adiantum is a wide-block cipher
> that uses XChaCha12 and AES-256 as its underlying components. Most of
> the work is done by XChaCha12, which is much faster than AES when AES
> acceleration is unavailable. For more information about Adiantum, see
> `the Adiantum paper <https://eprint.iacr.org/2018/720.pdf>`_.
>
> -The (AES-128-CBC-ESSIV, AES-128-CBC-CTS) pair exists only to support
> -systems whose only form of AES acceleration is an off-CPU crypto
> -accelerator such as CAAM or CESA that does not support XTS.
> +The (AES-128-CBC-ESSIV, AES-128-CBC-CTS) pair was added to try to
> +provide a more efficient option for systems that lack AES instructions
> +in the CPU but do have a non-inline crypto engine such as CAAM or CESA
> +that supports AES-CBC (and not AES-XTS). This is deprecated. It has
> +been shown that just doing AES on the CPU is actually faster.
> +Moreover, Adiantum is faster still and is recommended on such systems.
>
> The remaining mode pairs are the "national pride ciphers":
>
> - (SM4-XTS, SM4-CBC-CTS)
>
> @@ -1324,26 +1326,17 @@ that systems implementing a form of "verified boot" take advantage of
> this by validating all top-level encryption policies prior to access.
>
> Inline encryption support
> =========================
>
> -By default, fscrypt uses the kernel crypto API for all cryptographic
> -operations (other than HKDF, which fscrypt partially implements
> -itself). The kernel crypto API supports hardware crypto accelerators,
> -but only ones that work in the traditional way where all inputs and
> -outputs (e.g. plaintexts and ciphertexts) are in memory. fscrypt can
> -take advantage of such hardware, but the traditional acceleration
> -model isn't particularly efficient and fscrypt hasn't been optimized
> -for it.
> -
> -Instead, many newer systems (especially mobile SoCs) have *inline
> -encryption hardware* that can encrypt/decrypt data while it is on its
> -way to/from the storage device. Linux supports inline encryption
> -through a set of extensions to the block layer called *blk-crypto*.
> -blk-crypto allows filesystems to attach encryption contexts to bios
> -(I/O requests) to specify how the data will be encrypted or decrypted
> -in-line. For more information about blk-crypto, see
> +Many newer systems (especially mobile SoCs) have *inline encryption
> +hardware* that can encrypt/decrypt data while it is on its way to/from
> +the storage device. Linux supports inline encryption through a set of
> +extensions to the block layer called *blk-crypto*. blk-crypto allows
> +filesystems to attach encryption contexts to bios (I/O requests) to
> +specify how the data will be encrypted or decrypted in-line. For more
> +information about blk-crypto, see
> :ref:`Documentation/block/inline-encryption.rst <inline_encryption>`.
>
> On supported filesystems (currently ext4 and f2fs), fscrypt can use
> blk-crypto instead of the kernel crypto API to encrypt/decrypt file
> contents. To enable this, set CONFIG_FS_ENCRYPTION_INLINE_CRYPT=y in
> diff --git a/fs/crypto/fscrypt_private.h b/fs/crypto/fscrypt_private.h
> index c1d92074b65c..6e7164530a1e 100644
> --- a/fs/crypto/fscrypt_private.h
> +++ b/fs/crypto/fscrypt_private.h
> @@ -43,10 +43,27 @@
> * hardware-wrapped keys has made it misleading as it's only for raw keys.
> * Don't use it in kernel code; use one of the above constants instead.
> */
> #undef FSCRYPT_MAX_KEY_SIZE
>
> +/*
> + * This mask is passed as the third argument to the crypto_alloc_*() functions
> + * to prevent fscrypt from using the Crypto API drivers for non-inline crypto
> + * engines. Those drivers have been problematic for fscrypt. fscrypt users
> + * have reported hangs and even incorrect en/decryption with these drivers.
> + * Since going to the driver, off CPU, and back again is really slow, such
> + * drivers can be over 50 times slower than the CPU-based code for fscrypt's
> + * workload. Even on platforms that lack AES instructions on the CPU, using the
> + * offloads has been shown to be slower, even staying with AES. (Of course,
> + * Adiantum is faster still, and is the recommended option on such platforms...)
> + *
> + * Note that fscrypt also supports inline crypto engines. Those don't use the
> + * Crypto API and work much better than the old-style (non-inline) engines.
> + */
> +#define FSCRYPT_CRYPTOAPI_MASK \
> + (CRYPTO_ALG_ALLOCATES_MEMORY | CRYPTO_ALG_KERN_DRIVER_ONLY)
> +
> #define FSCRYPT_CONTEXT_V1 1
> #define FSCRYPT_CONTEXT_V2 2
>
> /* Keep this in sync with include/uapi/linux/fscrypt.h */
> #define FSCRYPT_MODE_MAX FSCRYPT_MODE_AES_256_HCTR2
> diff --git a/fs/crypto/hkdf.c b/fs/crypto/hkdf.c
> index 0f3028adc9c7..5b9c21cfe2b4 100644
> --- a/fs/crypto/hkdf.c
> +++ b/fs/crypto/hkdf.c
> @@ -56,11 +56,11 @@ int fscrypt_init_hkdf(struct fscrypt_hkdf *hkdf, const u8 *master_key,
> struct crypto_shash *hmac_tfm;
> static const u8 default_salt[HKDF_HASHLEN];
> u8 prk[HKDF_HASHLEN];
> int err;
>
> - hmac_tfm = crypto_alloc_shash(HKDF_HMAC_ALG, 0, 0);
> + hmac_tfm = crypto_alloc_shash(HKDF_HMAC_ALG, 0, FSCRYPT_CRYPTOAPI_MASK);
> if (IS_ERR(hmac_tfm)) {
> fscrypt_err(NULL, "Error allocating " HKDF_HMAC_ALG ": %ld",
> PTR_ERR(hmac_tfm));
> return PTR_ERR(hmac_tfm);
> }
> diff --git a/fs/crypto/keysetup.c b/fs/crypto/keysetup.c
> index 0d71843af946..d8113a719697 100644
> --- a/fs/crypto/keysetup.c
> +++ b/fs/crypto/keysetup.c
> @@ -101,11 +101,12 @@ fscrypt_allocate_skcipher(struct fscrypt_mode *mode, const u8 *raw_key,
> const struct inode *inode)
> {
> struct crypto_skcipher *tfm;
> int err;
>
> - tfm = crypto_alloc_skcipher(mode->cipher_str, 0, 0);
> + tfm = crypto_alloc_skcipher(mode->cipher_str, 0,
> + FSCRYPT_CRYPTOAPI_MASK);
> if (IS_ERR(tfm)) {
> if (PTR_ERR(tfm) == -ENOENT) {
> fscrypt_warn(inode,
> "Missing crypto API support for %s (API name: \"%s\")",
> mode->friendly_name, mode->cipher_str);
> diff --git a/fs/crypto/keysetup_v1.c b/fs/crypto/keysetup_v1.c
> index b70521c55132..158ceae8a5bc 100644
> --- a/fs/crypto/keysetup_v1.c
> +++ b/fs/crypto/keysetup_v1.c
> @@ -50,11 +50,12 @@ static int derive_key_aes(const u8 *master_key,
> {
> int res = 0;
> struct skcipher_request *req = NULL;
> DECLARE_CRYPTO_WAIT(wait);
> struct scatterlist src_sg, dst_sg;
> - struct crypto_skcipher *tfm = crypto_alloc_skcipher("ecb(aes)", 0, 0);
> + struct crypto_skcipher *tfm =
> + crypto_alloc_skcipher("ecb(aes)", 0, FSCRYPT_CRYPTOAPI_MASK);
>
> if (IS_ERR(tfm)) {
> res = PTR_ERR(tfm);
> tfm = NULL;
> goto out;
>
> base-commit: d0b3b7b22dfa1f4b515fd3a295b3fd958f9e81af
> --
> 2.50.0
>
>
Powered by blists - more mailing lists