lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 31 Dec 2013 06:46:42 +0400
From: Solar Designer <>
Subject: Re: [PHC] Best RNG for filling memory?

On Wed, Dec 25, 2013 at 07:10:57PM -0800, Tony Arcieri wrote:
> On Wed, Dec 25, 2013 at 11:27 AM, Samuel Neves <> wrote:
> > hydra7 (Intel Sandy Bridge) seems to have AES-NI implementations, which
> >  are still slower than Chacha8:
> >
> Nice, thanks for the pointer! It's quite interesting to see ChaCha8 is
> indeed faster than AES-128 on these architectures.

Curious indeed, but this might not hold when two threads per core are
run (with HT).

I am getting better speeds for AES-NI on Sandy Bridge when sufficient
parallelism is present.  For example, OpenSSL's builtin benchmark gives
55 GB/s for ECB mode on 2x E5-2670 (32 threads on 16 cores), at a clock
rate no higher than 3.0 GHz (max turbo with all cores in use).

[ ~]$ openssl speed -multi 32 -evp aes-128-ecb
evp           14680673.33k 36977213.57k 51997107.88k 54451153.58k 55059131.05k

3*10^9*16/(55059131*1024) = 0.85 cycles/byte per core
3*10^9*32/(55059131*1024) = 1.70 cycles/byte per thread

If I run only one thread per core (rather than two), it will be
somewhere inbetween (much like the SUPERCOP results referenced above).

Will ChaCha8 get below 0.85 cpb per core or 1.70 cpb per thread with
2 threads/core on SB?  This is unclear - would need to test.


Powered by blists - more mailing lists