[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201021151604.GA3750362@rani.riverdale.lan>
Date: Wed, 21 Oct 2020 11:16:04 -0400
From: Arvind Sankar <nivedita@...m.mit.edu>
To: David Laight <David.Laight@...LAB.COM>
Cc: 'Arvind Sankar' <nivedita@...m.mit.edu>,
Herbert Xu <herbert@...dor.apana.org.au>,
"David S. Miller" <davem@...emloft.net>,
"linux-crypto@...r.kernel.org" <linux-crypto@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 6/6] crypto: lib/sha - Combine round constants and
message schedule
On Tue, Oct 20, 2020 at 09:36:00PM +0000, David Laight wrote:
> From: Arvind Sankar
> > Sent: 20 October 2020 21:40
> >
> > Putting the round constants and the message schedule arrays together in
> > one structure saves one register, which can be a significant benefit on
> > register-constrained architectures. On x86-32 (tested on Broadwell
> > Xeon), this gives a 10% performance benefit.
>
> I'm actually stunned it makes that much difference.
> The object code must be truly horrid (before and after).
>
> There are probably other strange tweaks that give a similar
> improvement.
>
> David
>
Hm yes, I took a closer look at the generated code, and gcc seems to be
doing something completely braindead. Before this change, it actually
copies 8 words at a time from SHA256_K onto the stack, and uses those
stack temporaries for the calculation. So this patch is giving a benefit
just because it only does the copy once instead of every time around the
loop.
It doesn't even really need a register to hold SHA256_K since this isn't
PIC code, it could just access it directly as SHA256_K(%ecx) if it just
multiplied the loop counter i by 4.
Powered by blists - more mailing lists