[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231109080549.GC1245@sol.localdomain>
Date: Thu, 9 Nov 2023 00:05:49 -0800
From: Eric Biggers <ebiggers@...nel.org>
To: Jerry Shih <jerry.shih@...ive.com>
Cc: paul.walmsley@...ive.com, palmer@...belt.com,
aou@...s.berkeley.edu, herbert@...dor.apana.org.au,
davem@...emloft.net, andy.chiu@...ive.com, greentime.hu@...ive.com,
conor.dooley@...rochip.com, guoren@...nel.org, bjorn@...osinc.com,
heiko@...ech.de, ardb@...nel.org, phoebe.chen@...ive.com,
hongrong.hsu@...ive.com, linux-riscv@...ts.infradead.org,
linux-kernel@...r.kernel.org, linux-crypto@...r.kernel.org
Subject: Re: [PATCH 06/12] RISC-V: crypto: add accelerated
AES-CBC/CTR/ECB/XTS implementations
On Thu, Oct 26, 2023 at 02:36:38AM +0800, Jerry Shih wrote:
> +# prepare input data(v24), iv(v28), bit-reversed-iv(v16), bit-reversed-iv-multiplier(v20)
> +sub init_first_round {
> + my $code=<<___;
> + # load input
> + @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
> + @{[vle32_v $V24, $INPUT]}
> +
> + li $T0, 5
> + # We could simplify the initialization steps if we have `block<=1`.
> + blt $LEN32, $T0, 1f
> +
> + # Note: We use `vgmul` for GF(2^128) multiplication. The `vgmul` uses
> + # different order of coefficients. We should use`vbrev8` to reverse the
> + # data when we use `vgmul`.
> + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
> + @{[vbrev8_v $V0, $V28]}
> + @{[vsetvli "zero", $LEN32, "e32", "m4", "ta", "ma"]}
> + @{[vmv_v_i $V16, 0]}
> + # v16: [r-IV0, r-IV0, ...]
> + @{[vaesz_vs $V16, $V0]}
> +
> + # Prepare GF(2^128) multiplier [1, x, x^2, x^3, ...] in v8.
> + slli $T0, $LEN32, 2
> + @{[vsetvli "zero", $T0, "e32", "m1", "ta", "ma"]}
> + # v2: [`1`, `1`, `1`, `1`, ...]
> + @{[vmv_v_i $V2, 1]}
> + # v3: [`0`, `1`, `2`, `3`, ...]
> + @{[vid_v $V3]}
> + @{[vsetvli "zero", $T0, "e64", "m2", "ta", "ma"]}
> + # v4: [`1`, 0, `1`, 0, `1`, 0, `1`, 0, ...]
> + @{[vzext_vf2 $V4, $V2]}
> + # v6: [`0`, 0, `1`, 0, `2`, 0, `3`, 0, ...]
> + @{[vzext_vf2 $V6, $V3]}
> + slli $T0, $LEN32, 1
> + @{[vsetvli "zero", $T0, "e32", "m2", "ta", "ma"]}
> + # v8: [1<<0=1, 0, 0, 0, 1<<1=x, 0, 0, 0, 1<<2=x^2, 0, 0, 0, ...]
> + @{[vwsll_vv $V8, $V4, $V6]}
> +
> + # Compute [r-IV0*1, r-IV0*x, r-IV0*x^2, r-IV0*x^3, ...] in v16
> + @{[vsetvli "zero", $LEN32, "e32", "m4", "ta", "ma"]}
> + @{[vbrev8_v $V8, $V8]}
> + @{[vgmul_vv $V16, $V8]}
> +
> + # Compute [IV0*1, IV0*x, IV0*x^2, IV0*x^3, ...] in v28.
> + # Reverse the bits order back.
> + @{[vbrev8_v $V28, $V16]}
This code assumes that '1 << i' fits in 64 bits, for 0 <= i < vl.
I think that works out to an implicit assumption that VLEN <= 2048. I.e.,
AES-XTS encryption/decryption would produce the wrong result on RISC-V
implementations with VLEN > 2048.
Perhaps it should be explicitly checked that VLEN <= 2048?
- Eric
Powered by blists - more mailing lists