lists.openwall.net  lists / announce owlusers owldev johnusers johndev passwdqcusers yescrypt popa3dusers / osssecurity kernelhardening musl sabotage tlsify passwords / cryptdev xvendor / Bugtraq FullDisclosure linuxkernel linuxnetdev linuxext4 linuxhardening PHC  
Open Source and information security mailing list archives
 

Date: Tue, 28 Jan 2014 15:57:56 0600 (CST)
From: Steve Thomas <steve@...tu.com>
To: discussions@...swordhashing.net
Subject: Re: [PHC] my pre (if) submit proposal
Before I start I should fix the formulas. I messed up m vs m/r and had t**(1/2)
vs (m/r)**(1/2). Sorry about that I had a metal lapse after a brain fart.
For m = t*r/2
mem = (3/2*(r+c)+2*r)*(m/r)**(1/2)
time = 2.5x
i values normal cost extra cost
i=0...m/r/21 m/r/2 +0
i=m/r/2...m/r1 m/r/2 +m/r/2
i=m/r...3m/r/21 m/r/2 +m/r
i=3m/r/21...2m/r1 m/r/2 +3m/r/2
For m = t*r/4
mem = (7/2*(r+c)+2*r)*(m/r)**(1/2)
time = 4.5x
i values normal cost extra cost
i=0...m/r/21 m/r/2 +0
i=m/r/2...m/r1 m/r/2 +m/r/2
i=m/r...3m/r/21 m/r/2 +m/r
i=3m/r/21...2m/r1 m/r/2 +3m/r/2
i=2m/r...5m/r/21 m/r/2 +2m/r
i=5m/r/2...3m/r1 m/r/2 +5m/r/2
i=3m/r...7m/r/21 m/r/2 +3m/r
i=7m/r/21...4m/r1 m/r/2 +7m/r/2
In general
x =  1 t/(m/r) ≤ 1.5
 2 otherwise
sizeOfMem = ((t/(m/r)  1/2) * (r+c) + x * r) * (m/r)**(1/2)
timeFactor = t/(m/r) + 1/2
And because I can't keep track of c, m, r, t (I'm so cmrt
<https://www.youtube.com/watch?v=tcGQpjCztgA> [close enough :)]):
x =  1 numberOfTimesMemoryIsRead ≤ 1.5
 2 otherwise
sizeOfMem = ((numberOfTimesMemoryIsRead  1/2) * sizeOfState + x * sizeOfOutput)
* (blocksOfMemory)**(1/2)
timeFactor = numberOfTimesMemoryIsRead + 1/2
*** Note ***
This is for r divides m. Which you specify should not be the case in the paper,
but when r does not divide m this only adds a small amount of extra timememory.
> On January 20, 2014 at 6:35 PM Krisztián Pintér <pinterkr@...il.com> wrote:
>
> Steve Thomas (at Monday, January 20, 2014, 11:14:16 AM):
>
> > How I got this
> >
> > Always store every t**(1/2) states except the last m/2 since those
> > are never read.
>
> i might spend some more time on this later, but for now, i don't
> understand some things. one is with this quote above. it seems that
> the notion of state and "slot" in the mem array is mixed. the last m/2
> memory locations will not be read. this is the last m/r/2 states, as a
> state stores r words (which might be even 21 for keccak[c=256] and 64
> bit).
Yeah I worded that poorly I should have said "Store every (m/r)**(1/2)th
state except states for the last m/r/2 since those are never read."
That's odd I knew this but just had a mental lapse. Pretty much replace
m with m/r. Oh also it's not "t**(1/2)" it's "(m/r)**(1/2)".
> also note that this is true only for f=1 and m = t*r/2. which is not
> the case i intend to use in practice. for other cases, it is a little
> bit more complicated.
I only looked at f=1 because in your paper you state it as a perfect choice. I
only skimmed enough to find that. Also f=1 should never be used regardless if
the sponge function is reverseable or not.
> > For 2nd m/2 regenerate blocks of t**(1/2) R's when needed. This adds m/2
> > extra work.
>
> i don't get what do you mean by "m/2 extra work". m/2 states can't be
> recalculated in m/2 steps. you always need the previous state, and
> then the previous to that, etc, since you need the C part. that leads
> to, on average, (m/2)*t^0.5 / 2, if i'm not mistaken. anyway, so far
> it does not seem to be a break, we still have O(^2) combined, ^1.5 (if
> optimal parallelization is possible), which is expected in this early
> phase.
* m/r/2 not m/2 (sorry)
You have to recalculate m/r/2 blocks of data.
m/r = 64
(m/r)**(1/2) = 8
* is a saved state it has all the info needed to get the next state
 is unknown data
+ recalculated data
0 mem is zero
< reading pos
> writting pos
Starting at:
<>
****00000000000000000000000000000000
recalculate data (technically not needed this time since you just
calculated this data)
<>
****+++++++00000000000000000000000000000000
< >
****+++++++*000000000000000000000000
recalculate data
< >
***+++++++**000000000000000000000000
< >
***+++++++***0000000000000000
recalculate data
< >
**+++++++****0000000000000000
< >
**+++++++*****00000000
recalculate data
< >
*+++++++******00000000
< >
*++++++*******0000000
< >
*+++++*******+000000
...
> <
********+++++++
> > For 3rd m/2 regenerate blocks of t**(1/2) R's when needed then regenerate
> > blocks of t**(1/2) R's from the other R's. This adds m extra work.
>
> this i don't follow at all. this phase seems to be exactly the same as
> the previous. we read single written locations, are we not?
You keep the original states
! is a new saved state it has all the info needed to get the next state
Starting at:
> <
********+++++++
> <
********+++++++
!
recalculate data
> <
**+++++++******
!
more recalculate data
> <
**+++++++*****+++++++*
!
> <
*******+++++++*
!!
recalculate data
> <
***+++++++***+++++++**
!!
> <
******+++++++**
!!!
...
<>
********
!!!!+++++++
* Note you would keep the xored data in this part for the next part to save on
some caclutaltions.
> > You should note that you can do t**(1/3) and this increases the extra cost
> > by 2x
> > and memory by 2x for the constant in front of (r+c).
>
> and this is something i have no hope of understanding. how do you get
> these numbers? bonus question: what is the difference between catena
> and my access pattern that would cause this low exponent, while
> catenan approaches 2?
m/r = 64
(m/r)**(1/3) = 4
Starting at:
<>
***0000000000000000000000000000000
recalculate states (technically not needed this time since you just
calculated these states)
<>
******0000000000000000000000000000000
recalculate data (technically not needed this time since you just
calculated this data)
<>
*****+++*0000000000000000000000000000000
< >
*****+++*0000000000000000000000000000
recalculate data
< >
****+++**0000000000000000000000000000
< >
****+++**000000000000000000000000
recalculate data
< >
***+++***000000000000000000000000
< >
***+++***00000000000000000000
recalculate data
< >
**+++****00000000000000000000
recalculate states
< >
******0000000000000000
recalculate data
< >
****+++**0000000000000000
...
As you can see you are doing two passes over the missing data to regenerate
Also ln(64) ≈ 4.159 so after you hit 64^(1/4) it will cost more memory and time.
Since we are dealing with discrete amounts it might be sooner.
I plan on looking at memory cheating for Catena some time soon. I believe the
way the access pattern works you can store every (2**N)th state. For the first
pass after initialization, each step you take you'll need to do (2**N)/2 steps
on average to get the data you are looking for. On the second pass you need to
do (2**N)/2 steps which need to do (2**N)/2 steps each so ((2**N)/2)**2 steps.
Third pass it's ((2**N)/2)**3 steps and so on. Catena3 is three passes and is
considered the minimum. I think they are/were debating between Catena3 and
Catena4. Hmm I see Bill has an argument for Catena2.
Content of type "text/html" skipped
Powered by blists  more mailing lists