linux-kernel - Re: Updated scalable urandom patchkit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1444708222.900.0@smtp.gmail.com>
Date:	Mon, 12 Oct 2015 20:50:22 -0700
From:	Raymond Jennings <shentino@...il.com>
To:	Theodore Ts'o <tytso@....edu>, George Spelvin <linux@...izon.com>,
	ahferroin7@...il.com, andi@...stfloor.org, jepler@...ythonic.net,
	linux-kernel@...r.kernel.org, linux@...musvillemoes.dk
Subject: Re: Updated scalable urandom patchkit



On Mon, Oct 12, 2015 at 7:46 PM, Theodore Ts'o <tytso@....edu> wrote:
> On Mon, Oct 12, 2015 at 04:30:59PM -0400, George Spelvin wrote:
>>  > Segregating abusers solves both problems.  If we do this then we 
>> don't
>>  > need to drop the locks from the nonblocking pool, which solves the
>>  > security problem.
>> 
>>  Er, sort of.  I still think my points were valid, but they're
>>  about a particular optimization suggestion you had.  By avoiding
>>  the need for the optimization, the entire issue is mooted.
> 
> Sure, I'm not in love with anyone's particular optimization, whether
> it's mine, yours, or Andi's.  I'm just trying to solve the scalability
> problem while also trying to keep the code maintainable and easy to
> understand (and over the years we've actually made things worse, to
> the extent that having a single mixing for the input and output pools
> is starting to be more of problem than a feature, since we're coding
> in a bunch of exceptions when it's the output pool, etc.).
> 
> So if we can solve a problem by routing around it, that's fine in my
> book.
> 
>>  You have to copy the state *anyway* because you don't want it 
>> overwritten
>>  by the ChaCha output, so there's really no point storing the 
>> constants.
>>  (Also, ChaCha has a simpler input block structure than Salsa20; the
>>  constants are all adjacent.)
> 
> We're really getting into low-level implementations here, and I think
> it's best to worry about these sorts of things when we have a patch to
> review.....
> 
>>  (Note: one problem with ChaCha specifically is that is needs 16x32 
>> bits
>>  of registers, and Arm32 doesn't quite have enough.  We may want to 
>> provide
>>  an arch CPRNG hook so people can plug in other algorithms with good
>>  platform support, like x86 AES instructions.)
> 
> So while a ChaCha20-based CRNG should be faster than a SHA-1 based
> CRNG, and I consider this a good thing, for me speed is **not** more
> important than keeping the underlying code maintainable and simple.
> This is one of the reasons why I looked at, and then discarded, to use
> x86 accelerated AES as the basis for a CRNG.  Setting up AES so that
> it can be used easily with or without hardware acceleration looks very
> complicated to do in a cross-architectural way, and I don't want to
> drag in all of the crypto layer for /dev/random.
> 
>>  The same variables can be used (with different parameters) to 
>> decide if
>>  we want to get out of mitigation mode.  The one thing to watch out 
>> for
>>  is that "cat </dev/urandom >/dev/sdX" may have some huge pauses once
>>  the buffer cache fills.  We don't want to forgive after too small a
>>  fixed interval.
> 
> At least initially, once we go into mitigation mode for a particular
> process, it's probably safer to simply not exit it.
> 
>>  Finally, we have the issue of where to attach this rate-limiting 
>> structure
>>  and crypto context.  My idea was to use the struct file.  But now 
>> that
>>  we have getrandom(2), it's harder.  mm, task_struct, signal_struct, 
>> what?
> 
> I'm personally more inclined to keep it with the task struct, so that
> different threads will use different crypto contexts, just from
> simplicity point of view since we won't need to worry about locking.
> 
> Since many processes don't use /dev/urandom or getrandom(2) at all,
> the first time they do, we'd allocate a structure and hang it off the
> task_struct.  When the process exits, we would explicitly memzero it
> and then release the memory.
> 
>>  (Post-finally, do we want this feature to be configurable under
>>  CONFIG_EMBEDDED?  I know keeping the /dev/random code size small is
>>  a speficic design goal, and abuse mitigation is optional.)
> 
> Once we code it up we can see how many bytes this takes, we can have
> this discussion.  I'll note that ChaCha20 is much more compact than 
> SHA1:
> 
>    text	   data	    bss	    dec	    hex	filename
>    4230	      0	      0	   4230	   1086	/build/ext4-64/lib/sha1.o
>    1152	    304	      0	   1456	    
> 5b0	/build/ext4-64/crypto/chacha20_generic.o
> 
> ... and I've thought about this as being the first step towards
> potentially replacing SHA1 with something ChaCha20 based, in light of
> the SHAppening attack.  Unfortunately, BLAKE2s is similar to ChaCha
> only from design perspective, not an implementation perspective.
> Still, I suspect the just looking at the crypto primitives, even if we
> need to include two independent copies of the ChaCha20 core crypto and
> the Blake2s core crypto, it still should be about half the size of the
> SHA-1 crypto primitive.
> 
> And from the non-plumbing side of things, Andi's patchset increases
> the size of /dev/random by a bit over 6%, or 974 bytes from a starting
> base of 15719 bytes.  It ought to be possible to implement a ChaCha20
> based CRNG (ignoring the crypto primitives) in less than 974 bytes of
> x86_64 assembly.  :-)
> 
> 						- Ted
> 
> --
> To unsubscribe from this list: send the line "unsubscribe 
> linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

This might be stupid, but could something asynchronous work?  Perhaps 
have the entropy generators dump their entropy into a central pool via 
a cycbuf, and have a background kthread manage the per-cpu or 
per-process entropy pools?



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/