lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <CAAS2fgTK_S2XMrvGbhyqd+KuuLVFhOrziFsYv+0EePngtgOuqg@mail.gmail.com> Date: Wed, 1 Apr 2015 15:48:40 +0000 From: Gregory Maxwell <gmaxwell@...il.com> To: discussions@...sword-hashing.net Subject: Re: [PHC] Compute time hardness On Wed, Apr 1, 2015 at 3:20 PM, Bill Cox <waywardgeek@...il.com> wrote: > This is pretty difficult to guestimate, since we don't seem to have any > highly experienced deep submicron IC designers involved on this list. Maybe > we should try to rope one in from Intel, AMD, or NVDIA? > > We see good evidence that a multiply is at least 3X more computationally > expensive. If it we not, Intel would have a lower latency multiplier. Most > of the delay around an ADD operation is in moving the data around, not doing > addition, so there's an additional speed factor we are not sure of. I think > it will be pretty high, like 8X or 16X compared to a 32x32 -> 64 multiply. There are standards for this, e.g. ITU basicops. But they're really quite rubbish and I can pretty much promise that your random guesses will be better. It's not clear if the costs I was asking about were e.g. more gate area like or more power like or more latency like-- they're all different when you get into the details. Was I was curious about was mostly if there were any large separations among the leaders of the high memory fill-rate/bandwidth algorithms; e.g. is one doing a much better job of making use of processor resources than another which is otherwise close to in terms of bandwidth. If there is a large separation it may not matter how its counted. (Rationale being that among algorithms which are relatively closely matched in their ability to use memory-area and bandwidth one may prefer the algorithm that requires more computation (however you measure it) per unit wall-clock as a hedge against the economics of attacking the memory area/bandwidth changing in the future.)
Powered by blists - more mailing lists