lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 7 Mar 2018 00:18:56 +0100
From:   Pavel Machek <pavel@....cz>
To:     "Jason A. Donenfeld" <Jason@...c4.com>
Cc:     LKML <linux-kernel@...r.kernel.org>, pageexec@...email.hu
Subject: Re: C tricks for efficient stack zeroing

Hi!

> do_something(u8 *output, const u8 *input)
>     thing1(...)
>     thing2(...)
>         thinga(...)
>         thingb(...)
>            thingi(...)
>         thingc(...)
>     thing3(...)
>     thing4(...)
>         thinga(...)
>         thingc(...)
> 
> Each one of these functions have a few stack variables. The current
> solution is to call memzero_explicit() on each of those stack
> variables when each function return. But let's say that thingb uses as
> much or more stack as thinga. In this case, I'm wasting cycles (and
> gcc optimizations) by clearing the stack in both thinga and thingb,
> and I could probably get away with doing this in thingb only.
> Probably. But to hand estimate those seems a bit brittle.
> 
> What would be really nice would be to somehow keep track of the
> maximum stack depth, and just before the function returns, clear from
> the maximum depth to its stack base, all in one single call. This
> would not only make the code faster and less brittle, but it would
> also clean up some algorithms quite a bit.
> 
> Ideally this would take the form of a gcc attribute on the function,
> but I was unable to find anything of that nature. I started looking
> for little C tricks for this, and came up dry too. I realize I could

I'll probably not help you but...

Is it possible that code running _with_ zeroing would be actually
faster, performance-wise?

You know, after calling the crypto function, CPU has 2K of dirty data
in its caches. You really don't need that data to be written back to
DRAM, you'd prefer that data to be simply discarded.  (And it should
be easier to discard zeros than to discard non-zero data).

Now, I'm not saying common CPUs could take advantage of this, but I
believe at least belt machine did something similar in hw (
https://www.youtube.com/watch?v=QGw-cy0ylCc&list=PLx54dE17v2I2WG7tMybzhbJ81rTyJMJdU&index=2
)

Best regards,

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

Download attachment "signature.asc" of type "application/pgp-signature" (182 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ