[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <550A0FFE.9070805@cesarb.eti.br>
Date: Wed, 18 Mar 2015 20:53:34 -0300
From: Cesar Eduardo Barros <cesarb@...arb.eti.br>
To: mancha <mancha1@...o.com>, Stephan Mueller <smueller@...onox.de>
CC: Hannes Frederic Sowa <hannes@...essinduktion.org>,
Daniel Borkmann <daniel@...earbox.net>, tytso@....edu,
linux-kernel@...r.kernel.org, linux-crypto@...r.kernel.org,
herbert@...dor.apana.org.au, dborkman@...hat.com
Subject: Re: [BUG/PATCH] kernel RNG and its secrets
On 18-03-2015 14:14, mancha wrote:
> On Wed, Mar 18, 2015 at 05:02:01PM +0100, Stephan Mueller wrote:
>> Am Mittwoch, 18. März 2015, 16:09:34 schrieb Hannes Frederic Sowa:
>>> Seems like just using barrier() is the best and easiest option.
>
> However, if the idea is to use barrier() instead of OPTIMIZER_HIDE_VAR()
> in crypto_memneq() as well, then patch 0002 is the one to use. Please
> review and keep in mind my analysis was limited to memzero_explicit().
>
> Cesar, were there reasons you didn't use the gcc version of barrier()
> for crypto_memneq()?
Yes. Two reasons.
Take a look at how barrier() is defined:
#define barrier() __asm__ __volatile__("": : :"memory")
It tells gcc that the dummy assembly "instruction" touches memory (so
the compiler can't assume anything about the memory), and that nothing
should be moved from before to after the barrier and vice versa.
It mentions nothing about registers. Therefore, as far as I know gcc can
assume that the dummy "instruction" touches no integer registers (or
restores their values). I can imagine a sufficiently perverse compiler
using that fact to introduce timing-dependent computations. For
instance, it could load the values using more than one register and
combine them at the end, after the barriers; there, it could exit early
in case one of the registers is all-ones. My definition of
OPTIMIZER_HIDE_VAR introduces a data dependency to prevent that:
#define OPTIMIZER_HIDE_VAR(var) __asm__ ("" : "=r" (var) : "0" (var))
The second reason is that barrier() is too strong. For crypto_memneq,
only the or-chain is critical; the order or width of the loads makes no
difference. The compiler could, if it wishes, do all the loads and xors
first and do the or-chain at the end, or whenever it can see a pipeline
bubble; it doesn't matter as long as it does *all* the "or" operations,
in sequence.
I would be comfortable with a stronger OPTIMIZER_HIDE_VAR (adding
"memory" or volatile), even though it could limit optimization
opportunities, but I wouldn't be comfortable with a weaker
OPTIMIZER_HIDE_VAR (removing the data dependency), unless the gcc and
clang guys promise that our definition of barrier() will always prevent
undesired optimization of register-only operations.
There was a third reason for the exact definition of OPTIMIZER_HIDE_VAR:
it was copied from RELOC_HIDE, which is a longstanding "hide this
variable from gcc" operation, and thus known to work as expected.
--
Cesar Eduardo Barros
cesarb@...arb.eti.br
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists