lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 03 Apr 2014 12:57:03 -0700
From: Jeremi Gosney <>
Subject: Re: [PHC] Tortuga issues

On 4/3/2014 12:49 PM, Andy Lutomirski wrote:
> On Thu, Apr 3, 2014 at 12:44 PM, Jeremi Gosney <> wrote:
>> On 4/3/2014 12:40 PM, Andy Lutomirski wrote:
>>> On Thu, Apr 3, 2014 at 4:03 AM, Jeremi Gosney <> wrote:
>>>> On 4/2/2014 9:26 PM, Bill Cox wrote:
>>>>> Tortuga fails on both windows and Linux for > 1MiB m_cost, due to
>>>>> allocating hashing memory on the stack.
>>>> Just a heads-up, the optimized implementation of Pufferfish has this
>>>> `issue' as well, as it calls alloca() to dynamically allocate the sbox
>>>> buffers on the stack. The reference implementation allocates memory on
>>>> the heap with calloc() so this is not a problem there, but you'll blow
>>>> out the stack on the optimized implementation if using an m_cost > 10
>>>> (it doesn't "go to 11.")
>>>> And yes, this was done intentionally. Since it is unlikely that anyone
>>>> will be using an m_cost > 10, it's a mostly-safe optimization
>>>> (especially for attackers, which is largely what the optimized
>>>> implementation was, rewriting the algorithm from an attacker's perspective.)
>>>> For optimized defender code, where one might just be crazy enough to use
>>>> an m_cost of 11, there might be some benefit in writing a custom malloc
>>>> implementation that can quickly allocate heap memory without the
>>>> unnecessary overhead, not unlike JTR's mem_calloc_tiny(). But I think
>>>> this is implementation-specific detail that is outside the scope of the
>>>> PHC. Ideally implementers should be coding to the reference
>>>> implementation and making their own optimizations, using the optimized
>>>> code only as, erm, a reference.
>>> Remember that it's entirely possible that a PHC winner will be asked
>>> to compare an untrusted password to an unsalted hash, salt, and
>>> parameters.  Crashing isn't nice.
>>> [...]
>>> There's no probe, so, depending on the order in which the memory is
>>> accessed, this can shoot all the way past the guard page and turn into
>>> a standard buffer overflow.  (Of course, the data being written may
>>> not be easy to control, so it's mitigated a bit.)
>>> If you compile with -fstack-probe, you may get far better behavior.
>>> The code execution risk is gone (assuming that your threading library
>>> doesn't suck), and you can actually safely use a much larger amount of
>>> memory if you're in the main thread.
>>> On the other hand, using alloca for a one-time thing like this seems
>>> completely pointless.  A decent malloc can allocate a buffer in a few
>>> tens of ns.
>>> --Andy
>> You literally just re-stated everything that I said. Which is why I
>> bothered to say that the reference implementation does not use alloca(),
>> only the attacker-optimized code does.
> Sorry, I missed that defenders aren't supposed to use the optimized
> implementation.  I suppose that should be obvious, given that the code
> is a patch to JtR.

No worries.  And with that said, it probably wouldn't be a bad idea for
me to include some defender-optimized code as well. But I'm an attacker,
so of course the attacker-optimized code was priority ;)

But, the way I had envionsioned it, is that anyone implementing the
algorithm into a library, program, or whatever, would do so following
the reference implementation, making their own optimizations as they see
fit, and possibly referring to the optimized implementation to make
their optimizations.

Powered by blists - more mailing lists