lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 4 Mar 2014 22:54:39 +0400
From: Solar Designer <>
Subject: Re: [PHC] wider integer multiply on 32-bit x86

On Tue, Mar 04, 2014 at 06:40:40AM +0000, Samuel Neves wrote:
> On 04-03-2014 05:58, Solar Designer wrote:
> > So if I do a MUL, immediately save EDX:EAX to other registers, and
> > follow that with another similar MUL, the second MUL would be able to
> > proceed out-of-order, before the first one completes, correct?  (As long
> > as there's no data dependency between the two MULs, indeed.  Only the
> > same ISA registers used, which a renamer might resolve.)
> Correct. Here's an example where it is clearly visible:
>     mul rbx
>     mov rsi, rax
>     mov rax, rbx
>     mul rbx
>     mov rdi, rax
>     mov rax, rbx
> When put in a loop, this sequence consumes ~3.75 Sandy Bridge cycles per
> iteration. If you remove the 'mov eax, ebx' lines, it grows to ~6.5 (due
> to dependencies). There is still some overhead involved: a perfect loop
> should only require ~2 cycles per iteration. Haswell comes very close,
> at 2.12 (due to register moves also being eliminated at the renaming
> phase). I have no old Pentiums around to check how well renaming works
> there, though.

Thanks.  If you send me a complete test program that I can compile and
run on Linux, I'll test on P2, P3, P4.  (No renaming on P1.)


Powered by blists - more mailing lists