lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 31 Dec 2013 06:52:01 -0500
From: Bill Cox <waywardgeek@...il.com>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] Initial hashing function. Feedback welcome

On Tue, Dec 31, 2013 at 12:45 AM, Solar Designer <solar@...nwall.com> wrote:

> "Unfortunately, reducing N such that the array would fit in the L1 data
> cache prevented automatic parallelization from working.  What could the
> reason for this be?  I actually tried forcing the compiler into
> parallelizing the loop even for smaller values of N by using OpenMP
> directives, and the performance was poor - all 8 threads running, but so
> slowly that there was no advantage overall.  Apparently, this slowness
> was caused by cache coherence overhead.  I expected this problem to occur
> with multiple threads accessing the same cache line.  My test results
> demonstrated that it was worse than that: if a thread wrote to a given
> page of memory, accesses to that entire page by other threads would be
> slow (entry thrown out of TLBs maybe?)  I did not spend time to confirm
> (or disprove) this guess via the CPU's performance counters, though.
>
> The problem did not occur for read-only accesses [...]"
>
> For a non-artificial program where this issue occurred as well, read the
> "Long story short [...]" paragraph further on the same wiki page.  I ran
> into (what felt like) this same issue several times since then, testing
> on different machines (and CPU types), with the same workaround working.
>

Thanks for this tip!  My initial version, before I read about your idea for
hashing all of memory at the same time, rather than giving each thread it's
own N/p memory, performed much better with multiple threads.  It has been
driving me crazy that I'm having difficulty getting decent performance from
the new version.

I'll try having large continuous regions for each thread's write-region.

Content of type "text/html" skipped

Powered by blists - more mailing lists