linux-kernel - Re: [PATCH v4 10/24] crypto: x86/poly

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <MW5PR84MB1842E182C47F77FE8C8E755BAB139@MW5PR84MB1842.NAMPRD84.PROD.OUTLOOK.COM>
Date:   Mon, 28 Nov 2022 16:57:19 +0000
From:   "Elliott, Robert (Servers)" <elliott@....com>
To:     Herbert Xu <herbert@...dor.apana.org.au>,
        Ard Biesheuvel <ardb@...nel.org>
CC:     "Jason A. Donenfeld" <Jason@...c4.com>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "tim.c.chen@...ux.intel.com" <tim.c.chen@...ux.intel.com>,
        "ap420073@...il.com" <ap420073@...il.com>,
        "David.Laight@...lab.com" <David.Laight@...lab.com>,
        "ebiggers@...nel.org" <ebiggers@...nel.org>,
        "linux-crypto@...r.kernel.org" <linux-crypto@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v4 10/24] crypto: x86/poly - limit FPU preemption

On Fri, Nov 25, 2022 at 09:59:17AM +0100, Ard Biesheuvel wrote:
> On arm64, this is implemented in an assembler macro 'cond_yield' so we
> don't need to preserve/restore the SIMD state state at all if the
> yield is not going to result in a call to schedule(). For example, the
> SHA3 code keeps 400 bytes of state in registers, which we don't want
> to save and reload unless needed. (5f6cb2e617681 'crypto:
> arm64/sha512-ce - simplify NEON yield')

That sounds like the optimal approach. There is a cost to unnecessary
kernel_fpu_begin()/end() calls - increasing their usage in the x86
sha512 driver added 929 us during one boot. The cond_yield check is
just a few memory reads and conditional branches.

I see that is built on the asm-offsets.c technique mentioned by
Dave Hansen in the x86 aria driver thread.

> Note that it is only used in shash implementations, given that they
> are the only ones that may receive unbounded inputs.

Although typical usage probably doesn't stress this, the length of the
additional associated data presented to aead implementations is
unconstrained as well. At least in x86, they can end up processing
multiple megabytes in one chunk like the hash functions (if the
associated data is a big buffer described by a sg list created
with sg_init_one()).