[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140310105207.GA10622@openwall.com>
Date: Mon, 10 Mar 2014 14:52:07 +0400
From: Solar Designer <solar@...nwall.com>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] multiply latency reduction via table lookups
On Mon, Mar 10, 2014 at 08:40:50AM +0000, Samuel Neves wrote:
> On 10-03-2014 08:02, Solar Designer wrote:
> > Now, a 16x16->32 lookup table is still pretty large. A naive
> > implementation of it would take 16 GiB. A smarter implementation would
> > probably halve that, due to multiplication being commutative. (Can we
> > do better yet?)
>
> You can use the quarter-squares identity [1] to keep that table at
> around 2^16 entries,
More precisely, at around 2^17, I think, because one of the lookups is
by sum of two 16-bit inputs.
> in exchange for 2 memory accesses instead of 1.
> This seems to be a better tradeoff than 4 accesses (or 3 using
> Karatsuba) with an 8x8 table, under the assumption that additions and
> subtractions are 'free'.
Wow. Yes, this might work better.
> [1]
> https://en.wikipedia.org/wiki/Multiplication_algorithm#Quarter_square_multiplication
I should have found the Wikipedia page before posting.
Thank you!
Alexander
Powered by blists - more mailing lists