[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJkfWY524YtCXebG1Nj1=wGjGJpwRMCyOQyXk4fpg3BvPM17zw@mail.gmail.com>
Date: Thu, 11 May 2023 12:02:00 -0700
From: Nathan Huckleberry <nhuck@...gle.com>
To: Heiko Stübner <heiko@...ech.de>
Cc: palmer@...belt.com, paul.walmsley@...ive.com,
aou@...s.berkeley.edu, herbert@...dor.apana.org.au,
davem@...emloft.net, conor.dooley@...rochip.com,
linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org,
linux-crypto@...r.kernel.org, christoph.muellner@...ll.eu
Subject: Re: [PATCH v4 4/4] RISC-V: crypto: add accelerated GCM GHASH implementation
On Thu, May 11, 2023 at 3:30 AM Heiko Stübner <heiko@...ech.de> wrote:
>
> Hi Nathan,
>
> Am Dienstag, 11. April 2023, 17:00:00 CEST schrieb Nathan Huckleberry:
> > On Wed, Mar 29, 2023 at 7:08 AM Heiko Stuebner <heiko@...ech.de> wrote:
> > > +struct riscv64_ghash_ctx {
> > > + void (*ghash_func)(u64 Xi[2], const u128 Htable[16],
> > > + const u8 *inp, size_t len);
> > > +
> > > + /* key used by vector asm */
> > > + u128 htable[16];
> >
> > This field looks too big. The assembly only loads the first 128-byte
> > value from this table.
>
> OpenSSL defines the Htable field handed to the init- and the other
> functions as this "u128 Htable[16]" [0] . As I really like the concept
> of keeping in sync with openSSL, I guess I'd rather not change that.
>
> [0] https://github.com/openssl/openssl/blob/master/crypto/modes/gcm128.c#L88
>
>
> > Is this copied from another implementation? There's an optimization
> > where you precompute the first N powers of H so that you can perform 1
> > finite field reduction for every N multiplications, but it doesn't
> > look like that's being used here.
>
> The whole crypto-specific code comes from openSSL itself, so for now I
> guess I'd like to try keeping things the same.
>
>
> > > +#define RISCV64_ZBC_SETKEY(VARIANT, GHASH) \
> > > +void gcm_init_rv64i_ ## VARIANT(u128 Htable[16], const u64 Xi[2]); \
> > > +static int riscv64_zbc_ghash_setkey_ ## VARIANT(struct crypto_shash *tfm, \
> > > + const u8 *key, \
> > > + unsigned int keylen) \
> > > +{ \
> > > + struct riscv64_ghash_ctx *ctx = crypto_tfm_ctx(crypto_shash_tfm(tfm)); \
> > > + const u64 k[2] = { cpu_to_be64(((const u64 *)key)[0]), \
> > > + cpu_to_be64(((const u64 *)key)[1]) }; \
> > > + \
> > > + if (keylen != GHASH_BLOCK_SIZE) \
> > > + return -EINVAL; \
> > > + \
> > > + memcpy(&ctx->key, key, GHASH_BLOCK_SIZE); \
> > > + gcm_init_rv64i_ ## VARIANT(ctx->htable, k); \
> > > + \
> > > + ctx->ghash_func = gcm_ghash_rv64i_ ## GHASH; \
> > > + \
> > > + return 0; \
> > > +}
> >
> > I'd prefer three identical functions over a macro here. Code searching
> > tools and compiler warnings are significantly worse with macros.
>
> done :-)
>
>
> > > +
> > > +static int riscv64_zbc_ghash_update(struct shash_desc *desc,
> > > + const u8 *src, unsigned int srclen)
> > > +{
> > > + unsigned int len;
> > > + struct riscv64_ghash_ctx *ctx = crypto_tfm_ctx(crypto_shash_tfm(desc->tfm));
> > > + struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc);
> > > +
> > > + if (dctx->bytes) {
> > > + if (dctx->bytes + srclen < GHASH_DIGEST_SIZE) {
> > > + memcpy(dctx->buffer + dctx->bytes, src,
> > > + srclen);
> > > + dctx->bytes += srclen;
> > > + return 0;
> > > + }
> > > + memcpy(dctx->buffer + dctx->bytes, src,
> > > + GHASH_DIGEST_SIZE - dctx->bytes);
> > > +
> > > + ctx->ghash_func(dctx->shash, ctx->htable,
> > > + dctx->buffer, GHASH_DIGEST_SIZE);
> > > +
> > > + src += GHASH_DIGEST_SIZE - dctx->bytes;
> > > + srclen -= GHASH_DIGEST_SIZE - dctx->bytes;
> > > + dctx->bytes = 0;
> > > + }
> > > + len = srclen & ~(GHASH_DIGEST_SIZE - 1);
> > > +
> > > + if (len) {
> > > + gcm_ghash_rv64i_zbc(dctx->shash, ctx->htable,
> > > + src, len);
> > > + src += len;
> > > + srclen -= len;
> > > + }
> > > +
> > > + if (srclen) {
> > > + memcpy(dctx->buffer, src, srclen);
> > > + dctx->bytes = srclen;
> > > + }
> > > + return 0;
> > > +}
> > > +
> > > +static int riscv64_zbc_ghash_final(struct shash_desc *desc, u8 *out)
> > > +{
> > > + int i;
> > > + struct riscv64_ghash_ctx *ctx = crypto_tfm_ctx(crypto_shash_tfm(desc->tfm));
> > > + struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc);
> > > +
> > > + if (dctx->bytes) {
> > > + for (i = dctx->bytes; i < GHASH_DIGEST_SIZE; i++)
> > > + dctx->buffer[i] = 0;
> > > + ctx->ghash_func(dctx->shash, ctx->htable,
> > > + dctx->buffer, GHASH_DIGEST_SIZE);
> >
> > Can we do this without an indirect call?
>
> hmm, the indirect call is in both riscv64_zbc_ghash_update() and
> riscv64_zbc_ghash_final() . And I found a missing one at the bottom
> of riscv64_zbc_ghash_update(), where gcm_ghash_rv64i_zbc() is
> called right now.
>
> Getting rid of the indirect call would mean duplicating both of these
> functions for all instances. Especially with the slightly higher
> complexity of the update this somehow seems not the best way to go.
Indirect calls are quite expensive. They are an issue for things like
disk/filesystem encryption because it introduces a ton of latency per
block.
I think this is a candidate for static calls. It looks like static
call support hasn't been accepted for riscv yet. Maybe just add a TODO
for now?
See:
https://lwn.net/Articles/771209/
https://lore.kernel.org/all/tencent_A8A256967B654625AEE1DB222514B0613B07@qq.com/
>
>
> Thanks for your pointers
> Heiko
>
>
Thanks,
Huck
Powered by blists - more mailing lists