lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 8 Aug 2020 21:29:18 +0000
From:   George Spelvin <lkml@....ORG>
To:     Andy Lutomirski <>
Subject: Re: Flaw in "random32: update the net random state on interrupt and

On Sat, Aug 08, 2020 at 12:49:30PM -0700, Andy Lutomirski wrote:

> I don't care about throwing this stuff away. My plan (not quite 
> implemented yet) is to have a percpu RNG stream and to never to anything 
> resembling mixing anything in. The stream is periodically discarded and 
> reinitialized from the global "primary" pool instead.  The primary pool 
> has a global lock. We do some vaguely clever trickery to arrange for all 
> the percpu pools to reseed from the primary pool at different times.
> Meanwhile the primary pool gets reseeded by the input pool on a schedule 
> for catastrophic reseeding.

Sounds good to me.

> Do we really need 256 bits of key erasure?  I suppose if we only replace 
> half the key each time, we're just asking for some cryptographer to run 
> the numbers on a break-one-of-many attack and come up with something 
> vaguely alarming.

It's possible to have different levels of overall and key-erasure 
security, but I'm not sure what the point is.  It doesn't change the 
numbers *that* much.

(But yes, if you do it, I like the idea of arranging the key 
overwrite so all of the key gets replaced after two passes.)

> I wonder if we get good performance by spreading out the work. We could, 
> for example, have a 320 byte output buffer that get_random_bytes() uses 
> and a 320+32 byte ?next? buffer that is generated as the output buffer 
> is used. When we finish the output buffer, the first 320 bytes of the next 
> buffer becomes the current buffer and the extra 32 bytes becomes the new 
> key (or nonce).  This will have lower worst case latency, but it will 
> hit the cache lines more often, potentially hurting throughout.

You definitely lose something in locality of reference when you spread out 
the work, but you don't need a double-sized buffer (and the resultant 
D-cache hit). Every time you use up a block of current output, fill it 
with a block of next output.

The last 32 bytes of the buffer are the next key. When you've used up all 
of the current buffer but that, overwrite the last block of the current 
buffer with the next^2 key and start over at the beginning, outputting the 
was-next-now-current data.

On other words, with a 320-byte buffer, 320-32 = 288 bytes are available 
for output.  When we pass 64, 128, 256 and 288 bytes, there is a small 
latency spike to run one iteration of ChaCha.

The main issue is the latency between seeding and it affecting the output.  
In particular, I think people expect writes to /dev/random (RNDADDENTROPY) 
to affect subsequent reads immediately, so we'd need to invalidate & 
regenerate the buffer in that case.  We could do something with generation 
numbers so in-kernel users aren't affected.

(And remember that we don't have to fill the whole buffer.  If it's
early boot and we're expecting crng_init to increment, we could
pregenerate less.)

Powered by blists - more mailing lists