lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Fri, 18 Apr 2014 11:53:33 -0400
From: Bill Cox <waywardgeek@...il.com>
To: discussions@...sword-hashing.net
Subject: Non-temporal writes and uninitialized memory

I've been banging my head against a crazy problem for some time.  Using
temporal writes, I should be able to speed up TwoCats.  Nope!  Nothing
worked, and I tried many combinations.

Here's what I think is going on.  When I write hash data to a block of
uninitialized memory that I allocated with malloc (or posix_memalign),
somehow the CPU knows this, and therefore it does not bother to read the
cache line, modify it, and write it, like it normally does.  Instead, it
just buffers writes until a cache line is full, and then it writes that
cache line to cache.

When I use temporal writes in my inner loop, and then repeat my whole
memory hashing many times in an outer loop, I find that temporal writes
help a ton.  When I run my memory hashing just once, the temporal writes
actually slow me down!  The reason for this is that I have to fool the CPU
into doing a temporal write while keeping the written data in cache.  I do
this with a separate write to a buffer that remains in cache all the time.
 This combination is much better than not doing temporal writes when
writing to memory that is already initialized, and much worse for writing
to uninitialized memory.

Temporal loads for some reason never help at all.  Here's the temporal
write instruction I use to speed up writing to previously initialized
memory:

    _mm_stream_si128(p++, value);

TwoCats currently has no method for writing to previously initialized
memory, so it's no help to me.  Some of the other entries, like Yescript
and Lyra2 should be able to benefit from it, but only in the second loop,
not in the first.

Bill

Content of type "text/html" skipped

Powered by blists - more mailing lists