linux-kernel - Re: [PATCH v2 6/6] crypto: lib/sha - Combine round constants and message schedule

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20201021151604.GA3750362@rani.riverdale.lan>
Date:   Wed, 21 Oct 2020 11:16:04 -0400
From:   Arvind Sankar <nivedita@...m.mit.edu>
To:     David Laight <David.Laight@...LAB.COM>
Cc:     'Arvind Sankar' <nivedita@...m.mit.edu>,
        Herbert Xu <herbert@...dor.apana.org.au>,
        "David S. Miller" <davem@...emloft.net>,
        "linux-crypto@...r.kernel.org" <linux-crypto@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 6/6] crypto: lib/sha - Combine round constants and
 message schedule

On Tue, Oct 20, 2020 at 09:36:00PM +0000, David Laight wrote:
> From: Arvind Sankar
> > Sent: 20 October 2020 21:40
> > 
> > Putting the round constants and the message schedule arrays together in
> > one structure saves one register, which can be a significant benefit on
> > register-constrained architectures. On x86-32 (tested on Broadwell
> > Xeon), this gives a 10% performance benefit.
> 
> I'm actually stunned it makes that much difference.
> The object code must be truly horrid (before and after).
> 
> There are probably other strange tweaks that give a similar
> improvement.
> 
> 	David
> 

Hm yes, I took a closer look at the generated code, and gcc seems to be
doing something completely braindead. Before this change, it actually
copies 8 words at a time from SHA256_K onto the stack, and uses those
stack temporaries for the calculation. So this patch is giving a benefit
just because it only does the copy once instead of every time around the
loop.

It doesn't even really need a register to hold SHA256_K since this isn't
PIC code, it could just access it directly as SHA256_K(%ecx) if it just
multiplied the loop counter i by 4.