lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <87sixzkglp.fsf@wolfjaw.dfranke.us>
Date: Fri, 23 Aug 2013 21:44:18 -0400
From: Daniel Franke <dfoxfranke@...il.com>
To: discussions@...sword-hashing.net
Subject: Some updates on EARWORM

Here are some updates on EARWORM.  See
http://article.gmane.org/gmane.comp.security.phc/226 if you missed the
original thread.

* I've cut CHUNK_WIDTH from 4 to 2, leaving CHUNK_LENGTH at 128. Testing
  on my workstation seems to indicate that neither the reduction in
  internal parallelism nor the increased frequency of random memory
  accesses results in any performance penalty.

* I wrote a GPU implementation of EARWORM today. It computes batches of
  256 workunits of a single hash computation, doing the initial and
  final PRF computations sequentially on the host, while farming out the
  expensive main loops to the GPU for parallel execution.  A 25600
  workunit computation over a 256MiB arena takes about 3.05 seconds on
  my Radeon 7850. The same computation on two CPU threads takes about
  2.25 seconds. I was struck by how similar these numbers are.

* Answering bitweasil's question from earlier:

  > What does performance look like if you're doing a parallel build on
  > half your cores?  That should stress the memory system significantly
  > without choking CPU throughput too badly, and would resemble the type
  > of activity a busy webserver would likely see.  It may be the case
  > that the memory access to compute ratio is such that there is no
  > penalty for this - worth trying, though.

  I ran some EARWORM benchmarks while simulatanously building coreutils
  with -j 4 and streaming music from Pandora.  The benchmarks typically
  took about 10% longer to run than they would on a quiet system, though
  occasionally as much as 50% longer, due, I guess, to some quirk in the
  Linux scheduler or maybe my memory controller.  There was never any
  interruption in the Pandora audio.

* I'll be posting reference and AES-NI-optimized implementations of
  EARWORM to GitHub this weekend, as soon as I'm done with some
  finishing touches on the test harness.  Note that these
  implementations are designed for benchmarking and validation only, and
  lack the user-level API that I plan to include in later
  implementations (Obviously, nobody should be using EARWORM in
  production right now anyway!).

* The GPU implementation is currently a disgusting hairball and won't be
  included in this initial release. I'll eventually get around to
  cleaning it up, but my next major task for EARWORM is to write a spec,
  and I don't plan to do much more work on the software until that's in
  good shape.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ