[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <829439761.283784.1395283172949.open-xchange@email.1and1.com>
Date: Wed, 19 Mar 2014 21:39:32 -0500 (CDT)
From: Steve Thomas <steve@...tu.com>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] Supporting AVX2/SSE2 or not with a single binary
> On March 19, 2014 at 8:09 PM Andy Lutomirski <luto@...capital.net> wrote:
>
> On Wed, Mar 19, 2014 at 5:54 PM, Bill Cox <waywardgeek@...il.com> wrote:
> > One reason I think we see applications running without SSE/AVX2
> > support is that operating systems don't want to support two versions
> > of a binary, and they have to support older machines. The Blake2 code
> > I've read does not provide for a single binary that supports both - I
> > have to link either to the blake2-ref code or the blake2-sse code. My
> > TwoCats code has inherited this limitation, since I used the Blake2
> > code as a roadmap for figuring out how SSE2 works. These guys over on
> > StackOverflow think they've got code to detect SIMD support and allow
> > a single binary to support both:
> >
> > http://stackoverflow.com/questions/6121792/how-to-check-if-a-cpu-supports-the-sse3-instruction-set
>
> That answer is crap. You cannot detect whether AVX is usable using
> just cpuid -- you need to use (IIRC) xgetbv as well.
Agree.
> On gcc 4.8 and up, function multiversioning [1] is probably the way to
> go. On Windows (if you want to support MSVC), you'll need to do
> something different. Of course, function multiversioning has the same
> bug [2] and no one has fixed it yet.
>
> [1] http://gcc.gnu.org/wiki/FunctionMultiVersioning
> [2] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55307
Disagree. That's cool and all but worthless: only Linux and buggy.
------
Bill don't try to learn SS*E*/AVX* from the blake2-ref code... besides being
buggy. It is really hard to read. For me it was actually easier to read code
(https://github.com/dchest/blake2b) in a language I don't know than to read
that. I was trying to figure out how Blake2b worked and with the combination
of bad doc and bad code it was "impossible" (not worth a few hours). Until
finding a simple implementation in Go.
The way I' ve been dealing with write "once" compile twice (Linux/Windows)
run best code "everywhere" is: http://www.tobtu.com/files/rt-bench.zip
Specifically: src/common.cpp (getInstructionSets()), src/hashfactory.h, and
src\hash\*. Note there are better ways to do this. This was just one I did
awhile ago and I am still like well I could change that and it would be better.
Also this is for a different purpose hash cracking with rainbow tables but
can be used with hash cracking in general and parallel hashing. These
implementations are all single block and are severely length limited. Oh
right it has support for detecting AVX2 but no code is written for it. Although
it would be super easy to add it (I just wanted to test it first but at the time
I
don't think I knew about Intel's emulator and obviously AVX2 was not out yet
[this was all written in late 2011 and early 2012]). One way to make this
better is using templates and static inline class functions to do the minor
differences in architectures .
Content of type "text/html" skipped
Powered by blists - more mailing lists