lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 10 Jun 2007 12:47:19 -0400
From:	Benjamin Gilbert <bgilbert@...cmu.edu>
To:	Matt Mackall <mpm@...enic.com>
CC:	Jeff Garzik <jeff@...zik.org>, akpm@...ux-foundation.org,
	herbert@...dor.apana.org.au, linux-crypto@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

Matt Mackall wrote:
> On Sat, Jun 09, 2007 at 08:33:25PM -0400, Benjamin Gilbert wrote:
>> It's not just the loop unrolling; it's the register allocation and 
>> spilling.  For comparison, I built SHATransform() from the 
>> drivers/char/random.c in 2.6.11, using gcc 3.3.5 with -O2 and 
>> SHA_CODE_SIZE == 3 (i.e., fully unrolled); I'm guessing this is pretty 
>> close to what you tested back then.  The resulting code is 49% MOV 
>> instructions, and 80% of *those* involve memory.  gcc4 is somewhat 
>> better, but it still spills a whole lot, both for the 2.6.11 unrolled 
>> code and for the current lib/sha1.c.
> 
> Wait, your benchmark is comparing against the unrolled code?

No, it's comparing the current lib/sha1.c to the optimized code in the 
patch.  I was just pointing out that the unrolled code you were likely 
testing against, back then, may not have been very good.  (Though I 
assumed that you were talking about the unrolled code in random.c, not 
the code in CryptoAPI, so that might change the numbers some.  It 
appears from the post you linked below that the unrolled CryptoAPI code 
still beat the rolled version?)

> How big is the -code- footprint?

About 3700 bytes for the 32-bit version of sha_transform().

> Whoa. We've regressed something horrible here:
> 
> http://groups.google.com/group/linux.kernel/msg/fba056363c99d4f9?dmode=source&hl=en
> 
> In 2003, I was getting 17MB/s out of my Athlon. Now I'm getting 2.7MB/s.
> Were your tests with or without the latest /dev/urandom fixes? This
> one in particular:
> 
> http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.21.y.git;a=commitdiff;h=374f167dfb97c1785515a0c41e32a66b414859a8

I'm not in front of that machine right now; I can check tomorrow.  For 
what it's worth, I've seen equivalent performance (a few MB/s) on a 
range of fairly-recent kernels.

--Benjamin Gilbert
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ