phc-discussions - Re: [PHC] TigerKDF paper and code ready for review

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOLP8p64o04TuzcQN3t6En7jx_8dJNR-u7WMi+oMk0dx29Vm6w@mail.gmail.com>
Date: Fri, 7 Mar 2014 23:24:56 -0500
From: Bill Cox <waywardgeek@...il.com>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] TigerKDF paper and code ready for review

On Fri, Mar 7, 2014 at 10:03 PM, Solar Designer <solar@...nwall.com> wrote:
> On Fri, Mar 07, 2014 at 08:56:31PM -0500, Bill Cox wrote:
>> On Fri, Mar 7, 2014 at 7:13 PM, Solar Designer <solar@...nwall.com> wrote:
>> >> At least with AVX2 on Haswell, I would be surprised if Bcrypt's inner
>> >> loop were faster, so for hashing out of just L1 cache, I'm probably
>> >> good on that platform vs current GPUs.
>> >
>> > Are you trying to say that on Haswell you read 64 byte blocks as rapidly
>> > as bcrypt reads 4 byte blocks (also on Haswell)?  I think this is false.
>>
>> I need to verify this, but I think the math last time showed that a
>> loop with 3 AVX2 instructions reading 2 32 byte values at a time,
>> hashing them slightly, and writing them back ran at the clock speed.
>> I should have said 32-bytes, not 64.
>
> OK, you might be right.
>
> For my optimized defensive implementation of bcrypt running on the
> i7-4770K, I am getting 603 c/s at $2a$05 when running one thread (and
> one instance), and 4330 c/s when running 8 threads.  (FWIW, attack
> optimized implementations, with 2 or 3 instances per thread, are twice
> faster when running one thread, or 1.5 times faster when running 8
> threads.)
>
> This gives:
>
> 603*34000*16*4/(3.9*10^9) = 0.336 lookups/cycle
> 4330*34000*16*4/(3.7*10^9) = 2.55 lookups/cycle
>
> where 34000 is roughly the number of Blowfish encryptions performed,
> 16 is the number of Blowfish rounds, and 4 is the number of S-box
> lookups per round.
>
> (For attack, we'd have about 4 lookups/cycle on this CPU, meaning about
> 1 lookup/cycle/core.)
>
> BTW, the defensive single-thread result above is pretty bad in how it
> compares against historical Blowfish cycles per byte speeds.  It is:
>
> 3.9*10^9 / (34000*8*603) = 23.78 cpb
>
> whereas the original Pentium did 18 cpb:
>
> https://www.schneier.com/blowfish-speed.html
>
> and my own assembly code for bcrypt on the original Pentium was inbetween:
>
> 120*10^6 / (34000*8*20.1) = 21.95 cpb
>
> (This is for defensive, thread-safe code; attack and thread-unsafe code
> was slightly faster, and is 1.5x to 2x faster on Haswell.)
>
> Perhaps there's some room for further optimization/tuning for Haswell,
> but overall these numbers should be about right.
>
> Alexander

After three glasses of wine, all I can say is... ouch, that hurts!

I really shouldn't post after drinking, which frankly gets me in
trouble (there's a guy I called a dork on a forum, when I probably
shouldn't have)

So, this is a good time to do personality analysis of your code,
because normally I'd have slightly more sense.  You indent by tab,
which is one keystroke, there are very few comments, and you like 1
letter variable names.  This indicates you feel typing speed is a
limiting factor, and I bet you type pretty fast.  I read a very long
function you wrote that could have been broken up, but that would also
have slowed you down, as there were no common sub-functions to factor
out, or you would have.  None of this lack of commenting or long
variable names are useful to you, because the code is perfectly clear
without it.  Given all this, I'd bet you also have at least two
screens.  One just wouldn't be enough.

You APIs in header files are very clear, and I bet you focus on that.

I've seen code like this before, just one other time, and I've read a
crazy amount of code.  Actually, this other guy might have a bit of an
edge on you (and way past me) in terms of writing very efficient code
quickly and accurately.  He hit his head on a rock as a child and the
way his brain rewired itself turned him into a coding genius (though I
don't know what he'd be like without the rock):

    http://en.wikipedia.org/wiki/Synplicity

I went to work for Ken in 1996 at Synplicity, hoping I could learn to
be more like him, but my brain isn't wired like his.  I've got some
fun stories about him.  I was the first coder he hired, though there
were 24 employees already, because he could handle the coding load by
himself for a long time.  There were 350,000 lines of hardware
description language synthesis code, and he hired me to help out with
it.  I opened one of the more interesting files (the factoring engine,
which blew doors on competitors), which was 2,400 lines long, most of
which being 1 function, with no comments, tabs for indentation, and he
actually ran out of good 1-letter variable names and started using
2-letter names.  I figured he must just put the comments somewhere
else, like header files, so I greped the whole code base for /*...
there was just one, and it said "This is a hack".

He had two monitors covered in maybe a couple hundred icons, most of
which ran custom scripts.  He had the build icon in release mode, and
a an icon to publish for customer download, but no debug icon I ever
saw him use (it was probably a lonely unused icon).  He would wake up,
write a piece of code that blew away everything in the field, hit the
compile in release mode button until his typos were fixed, maybe (and
not always) run it once, and then hit the release to customers button.
 He's like Mozart with code.  I found a mistake he made once, which he
assumed was me being stupid, and when I turned out to be right, I made
it on is somebody worth working with list, and thus the job later on.
I don't know if he remembers details like that... he forgot I was the
first programmer he hired.

Anyway, he owned over 50% of Synplicity before dilution to go public,
which is about the best I ever saw.  He's got a lot of money now, but
you'd never know it... he's still the same guy.  Just now he drives a
Tesla Model S.

Anyway, I'm used to seeing code like this.

Bill