phc-discussions - Re: [PHC] my pre (if) submit proposal

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <715937984.107579.1390946276422.open-xchange@email.1and1.com>
Date: Tue, 28 Jan 2014 15:57:56 -0600 (CST)
From: Steve Thomas <steve@...tu.com>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] my pre (if) submit proposal

Before I start I should fix the formulas. I messed up m vs m/r and had t**(1/2)
vs (m/r)**(1/2). Sorry about that I had a metal lapse after a brain fart.

For m = t*r/2

mem = (3/2*(r+c)+2*r)*(m/r)**(1/2)
time = 2.5x

i values             normal cost  extra cost
i=0...m/r/2-1        m/r/2        +0
i=m/r/2...m/r-1      m/r/2        +m/r/2
i=m/r...3m/r/2-1     m/r/2        +m/r
i=3m/r/2-1...2m/r-1  m/r/2        +3m/r/2


For m = t*r/4

mem = (7/2*(r+c)+2*r)*(m/r)**(1/2)
time = 4.5x

i values             normal cost  extra cost
i=0...m/r/2-1        m/r/2        +0
i=m/r/2...m/r-1      m/r/2        +m/r/2
i=m/r...3m/r/2-1     m/r/2        +m/r
i=3m/r/2-1...2m/r-1  m/r/2        +3m/r/2
i=2m/r...5m/r/2-1    m/r/2        +2m/r
i=5m/r/2...3m/r-1    m/r/2        +5m/r/2
i=3m/r...7m/r/2-1    m/r/2        +3m/r
i=7m/r/2-1...4m/r-1  m/r/2        +7m/r/2


In general

x = | 1   t/(m/r) ≤ 1.5
    | 2   otherwise
sizeOfMem = ((t/(m/r) - 1/2) * (r+c) + x * r) * (m/r)**(1/2)
timeFactor = t/(m/r) + 1/2

And because I can't keep track of c, m, r, t (I'm so c-m-r-t
<https://www.youtube.com/watch?v=tcGQpjCztgA> [close enough :)]):
x = | 1   numberOfTimesMemoryIsRead ≤ 1.5
    | 2   otherwise
sizeOfMem = ((numberOfTimesMemoryIsRead - 1/2) * sizeOfState + x * sizeOfOutput)
* (blocksOfMemory)**(1/2)
timeFactor = numberOfTimesMemoryIsRead + 1/2


*** Note ***
This is for r divides m. Which you specify should not be the case in the paper,
but when r does not divide m this only adds a small amount of extra time-memory.


> On January 20, 2014 at 6:35 PM Krisztián Pintér <pinterkr@...il.com> wrote:
>
> Steve Thomas (at Monday, January 20, 2014, 11:14:16 AM):
>
> > How I got this
> >
> > Always store every t**(1/2) states except the last m/2 since those
> > are never read.
>
> i might spend some more time on this later, but for now, i don't
> understand some things. one is with this quote above. it seems that
> the notion of state and "slot" in the mem array is mixed. the last m/2
> memory locations will not be read. this is the last m/r/2 states, as a
> state stores r words (which might be even 21 for keccak[c=256] and 64
> bit).

Yeah I worded that poorly I should have said "Store every (m/r)**(1/2)th
state except states for the last m/r/2 since those are never read."

That's odd I knew this but just had a mental lapse. Pretty much replace
m with m/r. Oh also it's not "t**(1/2)" it's "(m/r)**(1/2)".


> also note that this is true only for f=-1 and m = t*r/2. which is not
> the case i intend to use in practice. for other cases, it is a little
> bit more complicated.

I only looked at f=-1 because in your paper you state it as a perfect choice. I
only skimmed enough to find that. Also f=-1 should never be used regardless if
the sponge function is reverseable or not.


> > For 2nd m/2 regenerate blocks of t**(1/2) R's when needed. This adds m/2
> > extra work.
>
> i don't get what do you mean by "m/2 extra work". m/2 states can't be
> recalculated in m/2 steps. you always need the previous state, and
> then the previous to that, etc, since you need the C part. that leads
> to, on average, (m/2)*t^0.5 / 2, if i'm not mistaken. anyway, so far
> it does not seem to be a break, we still have O(^2) combined, ^1.5 (if
> optimal parallelization is possible), which is expected in this early
> phase.

* m/r/2 not m/2 (sorry)

You have to recalculate m/r/2 blocks of data.

m/r = 64
(m/r)**(1/2) = 8

* is a saved state it has all the info needed to get the next state
- is unknown data
+ recalculated data
0 mem is zero
< reading pos
> writting pos

Starting at:
                               <>
*-------*-------*-------*-------00000000000000000000000000000000

recalculate data (technically not needed this time since you just
calculated this data)
                               <>
*-------*-------*-------*+++++++00000000000000000000000000000000

                       <                >
*-------*-------*-------*+++++++*-------000000000000000000000000

recalculate data
                       <                >
*-------*-------*+++++++*-------*-------000000000000000000000000

               <                                >
*-------*-------*+++++++*-------*-------*-------0000000000000000

recalculate data
               <                                >
*-------*+++++++*-------*-------*-------*-------0000000000000000

       <                                                >
*-------*+++++++*-------*-------*-------*-------*-------00000000

recalculate data
       <                                                >
*+++++++*-------*-------*-------*-------*-------*-------00000000

      <                                                  >
*++++++-*-------*-------*-------*-------*-------*-------*0000000

     <                                                    >
*+++++--*-------*-------*-------*-------*-------*-------*+000000
...
>                                                              <
*-------*-------*-------*-------*-------*-------*-------*+++++++


> > For 3rd m/2 regenerate blocks of t**(1/2) R's when needed then regenerate
> > blocks of t**(1/2) R's from the other R's. This adds m extra work.
>
> this i don't follow at all. this phase seems to be exactly the same as
> the previous. we read single written locations, are we not?

You keep the original states

! is a new saved state it has all the info needed to get the next state

Starting at:
>                                                              <
*-------*-------*-------*-------*-------*-------*-------*+++++++


        >                                              <
*-------*-------*-------*-------*-------*-------*-------*+++++++
!-------

recalculate data
        >                                              <
*-------*+++++++*-------*-------*-------*-------*-------*-------
!-------

more recalculate data
        >                                              <
*-------*+++++++*-------*-------*-------*-------*+++++++*-------
!-------

                >                               <
*-------*-------*-------*-------*-------*-------*+++++++*-------
!-------!-------

recalculate data
                >                               <
*-------*-------*+++++++*-------*-------*+++++++*-------*-------
!-------!-------

                        >               <
*-------*-------*-------*-------*-------*+++++++*-------*-------
!-------!-------!-------

...

                               <>
*-------*-------*-------*-------*-------*-------*-------*-------
!-------!-------!-------!+++++++

* Note you would keep the xored data in this part for the next part to save on
some caclutaltions.


> > You should note that you can do t**(1/3) and this increases the extra cost
> > by 2x
> > and memory by 2x for the constant in front of (r+c).
>
> and this is something i have no hope of understanding. how do you get
> these numbers? bonus question: what is the difference between catena
> and my access pattern that would cause this low exponent, while
> catena-n approaches 2?

m/r = 64
(m/r)**(1/3) = 4

Starting at:
                               <>
*---------------*---------------*0000000000000000000000000000000

recalculate states (technically not needed this time since you just
calculated these states)
                               <>
*---------------*---*---*---*---*0000000000000000000000000000000

recalculate data (technically not needed this time since you just
calculated this data)
                               <>
*---------------*---*---*---*+++*0000000000000000000000000000000

                           <        >
*---------------*---*---*---*+++*---0000000000000000000000000000

recalculate data
                           <        >
*---------------*---*---*+++*---*---0000000000000000000000000000

                       <                >
*---------------*---*---*+++*---*-------000000000000000000000000

recalculate data
                       <                >
*---------------*---*+++*---*---*-------000000000000000000000000

                   <                        >
*---------------*---*+++*---*---*-----------00000000000000000000

recalculate data
                   <                        >
*---------------*+++*---*---*---*-----------00000000000000000000

recalculate states
               <                                >
*---*---*---*---*---------------*---------------0000000000000000

recalculate data
               <                                >
*---*---*---*+++*---------------*---------------0000000000000000
...

As you can see you are doing two passes over the missing data to regenerate
Also ln(64) ≈ 4.159 so after you hit 64^(1/4) it will cost more memory and time.
Since we are dealing with discrete amounts it might be sooner.

I plan on looking at memory cheating for Catena some time soon. I believe the
way the access pattern works you can store every (2**N)th state. For the first
pass after initialization, each step you take you'll need to do (2**N)/2 steps
on average to get the data you are looking for. On the second pass you need to
do (2**N)/2 steps which need to do (2**N)/2 steps each so ((2**N)/2)**2 steps.
Third pass it's ((2**N)/2)**3 steps and so on. Catena-3 is three passes and is
considered the minimum. I think they are/were debating between Catena-3 and
Catena-4. Hmm I see Bill has an argument for Catena-2.
Content of type "text/html" skipped