linux-kernel - Re: Updated scalable urandom patchkit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20151013024645.GA31306@thunk.org>
Date:	Mon, 12 Oct 2015 22:46:45 -0400
From:	Theodore Ts'o <tytso@....edu>
To:	George Spelvin <linux@...izon.com>
Cc:	ahferroin7@...il.com, andi@...stfloor.org, jepler@...ythonic.net,
	linux-kernel@...r.kernel.org, linux@...musvillemoes.dk
Subject: Re: Updated scalable urandom patchkit

On Mon, Oct 12, 2015 at 04:30:59PM -0400, George Spelvin wrote:
> > Segregating abusers solves both problems.  If we do this then we don't
> > need to drop the locks from the nonblocking pool, which solves the
> > security problem.
> 
> Er, sort of.  I still think my points were valid, but they're
> about a particular optimization suggestion you had.  By avoiding
> the need for the optimization, the entire issue is mooted.

Sure, I'm not in love with anyone's particular optimization, whether
it's mine, yours, or Andi's.  I'm just trying to solve the scalability
problem while also trying to keep the code maintainable and easy to
understand (and over the years we've actually made things worse, to
the extent that having a single mixing for the input and output pools
is starting to be more of problem than a feature, since we're coding
in a bunch of exceptions when it's the output pool, etc.).

So if we can solve a problem by routing around it, that's fine in my
book.

> You have to copy the state *anyway* because you don't want it overwritten
> by the ChaCha output, so there's really no point storing the constants.
> (Also, ChaCha has a simpler input block structure than Salsa20; the
> constants are all adjacent.)

We're really getting into low-level implementations here, and I think
it's best to worry about these sorts of things when we have a patch to
review.....

> (Note: one problem with ChaCha specifically is that is needs 16x32 bits
> of registers, and Arm32 doesn't quite have enough.  We may want to provide
> an arch CPRNG hook so people can plug in other algorithms with good
> platform support, like x86 AES instructions.)

So while a ChaCha20-based CRNG should be faster than a SHA-1 based
CRNG, and I consider this a good thing, for me speed is **not** more
important than keeping the underlying code maintainable and simple.
This is one of the reasons why I looked at, and then discarded, to use
x86 accelerated AES as the basis for a CRNG.  Setting up AES so that
it can be used easily with or without hardware acceleration looks very
complicated to do in a cross-architectural way, and I don't want to
drag in all of the crypto layer for /dev/random.

> The same variables can be used (with different parameters) to decide if
> we want to get out of mitigation mode.  The one thing to watch out for
> is that "cat </dev/urandom >/dev/sdX" may have some huge pauses once
> the buffer cache fills.  We don't want to forgive after too small a
> fixed interval.

At least initially, once we go into mitigation mode for a particular
process, it's probably safer to simply not exit it.

> Finally, we have the issue of where to attach this rate-limiting structure
> and crypto context.  My idea was to use the struct file.  But now that
> we have getrandom(2), it's harder.  mm, task_struct, signal_struct, what?

I'm personally more inclined to keep it with the task struct, so that
different threads will use different crypto contexts, just from
simplicity point of view since we won't need to worry about locking.

Since many processes don't use /dev/urandom or getrandom(2) at all,
the first time they do, we'd allocate a structure and hang it off the
task_struct.  When the process exits, we would explicitly memzero it
and then release the memory.

> (Post-finally, do we want this feature to be configurable under
> CONFIG_EMBEDDED?  I know keeping the /dev/random code size small is
> a speficic design goal, and abuse mitigation is optional.)

Once we code it up we can see how many bytes this takes, we can have
this discussion.  I'll note that ChaCha20 is much more compact than SHA1:

   text	   data	    bss	    dec	    hex	filename
   4230	      0	      0	   4230	   1086	/build/ext4-64/lib/sha1.o
   1152	    304	      0	   1456	    5b0	/build/ext4-64/crypto/chacha20_generic.o

... and I've thought about this as being the first step towards
potentially replacing SHA1 with something ChaCha20 based, in light of
the SHAppening attack.  Unfortunately, BLAKE2s is similar to ChaCha
only from design perspective, not an implementation perspective.
Still, I suspect the just looking at the crypto primitives, even if we
need to include two independent copies of the ChaCha20 core crypto and
the Blake2s core crypto, it still should be about half the size of the
SHA-1 crypto primitive.

And from the non-plumbing side of things, Andi's patchset increases
the size of /dev/random by a bit over 6%, or 974 bytes from a starting
base of 15719 bytes.  It ought to be possible to implement a ChaCha20
based CRNG (ignoring the crypto primitives) in less than 974 bytes of
x86_64 assembly.  :-)

						- Ted

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/