lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 11 Jan 2022 15:46:56 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     "'Jason A. Donenfeld'" <Jason@...c4.com>,
        Geert Uytterhoeven <geert@...ux-m68k.org>
CC:     Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Theodore Tso <tytso@....edu>,
        Greg KH <gregkh@...uxfoundation.org>,
        Jean-Philippe Aumasson <jeanphilippe.aumasson@...il.com>
Subject: RE: [PATCH v2 2/2] random: use BLAKE2s instead of SHA1 in extraction

From: Jason A. Donenfeld
> Sent: 11 January 2022 12:50
> 
> On Tue, Jan 11, 2022 at 1:28 PM Jason A. Donenfeld <Jason@...c4.com> wrote:
> > If you're really quite concerned about m68k code size, I can probably
> > do some things to reduce that. For example, blake2s256_hmac is only
> > used by wireguard and it could probably be made local there. And with
> > some trivial loop re-rolling, I can shave off another 2300 bytes. And
> > I bet I can find a few other things too. The question is: how
> > important is this to you?
> 
> And with another trick (see below), another extra 1000 bytes or so
> shaved off. Aside from moving blake2s256_hmac, I'm not really super
> enthusiastic about making these changes, but depending on how important
> this is to you, maybe we can make something work. There are probably
> additional possibilities too with the code.
> 
> ====
> 
> diff --git a/lib/crypto/blake2s-generic.c b/lib/crypto/blake2s-generic.c
> index 75ccb3e633e6..8e3c6372363a 100644
> --- a/lib/crypto/blake2s-generic.c
> +++ b/lib/crypto/blake2s-generic.c
> @@ -46,7 +46,7 @@ void blake2s_compress_generic(struct blake2s_state *state, const u8 *block,
>  {
>  	u32 m[16];
>  	u32 v[16];
> -	int i;
> +	int i, j;

Use unsigned int i, j;
Ensures the '% 4' are done as '& 3' and the divides as shifts.
Unless the compiler manages to track the valid values that will
even generate better code on x86-64.
(Saves a sign extension prior to the array indexes.)

>  	WARN_ON(IS_ENABLED(DEBUG) &&
>  		(nblocks > 1 && inc != BLAKE2S_BLOCK_SIZE));
> @@ -76,29 +76,11 @@ void blake2s_compress_generic(struct blake2s_state *state, const u8 *block,
>  	b = ror32(b ^ c, 7); \
>  } while (0)
> 
> -#define ROUND(r) do { \
> -	G(r, 0, v[0], v[ 4], v[ 8], v[12]); \
> -	G(r, 1, v[1], v[ 5], v[ 9], v[13]); \
> -	G(r, 2, v[2], v[ 6], v[10], v[14]); \
> -	G(r, 3, v[3], v[ 7], v[11], v[15]); \
> -	G(r, 4, v[0], v[ 5], v[10], v[15]); \
> -	G(r, 5, v[1], v[ 6], v[11], v[12]); \
> -	G(r, 6, v[2], v[ 7], v[ 8], v[13]); \
> -	G(r, 7, v[3], v[ 4], v[ 9], v[14]); \
> -} while (0)
> -		ROUND(0);
> -		ROUND(1);
> -		ROUND(2);
> -		ROUND(3);
> -		ROUND(4);
> -		ROUND(5);
> -		ROUND(6);
> -		ROUND(7);
> -		ROUND(8);
> -		ROUND(9);
> -
> +		for (i = 0; i < 10; ++i) {
> +			for (j = 0; j < 8; ++j)
> +				G(i, j, v[j % 4], v[((j + (j / 4)) % 4) + 4], v[((j + 2 * (j / 4)) % 4) + 8],
> v[((j + 3 * (j / 4)) % 4) + 12]);

I think I'd look at doing [0..3] then [4..7] to save execution time.

I actually wonder how large a block you need to be processing to get
a gain from all that unrolling on architectures like x86-64.
The 'cold cache' timing must be horrid.
Never mind the side effects of displacing that much other code from the I-cache.

There aren't enough registers to hold all the v[] values so they'll
all be memory reads - and there are probably others as well.
The other instructions will happen in parallel - even 3 or 4 for each memory read.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ