netdev - Re: Flaw in "random32: update the net random state on interrupt and activity"

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <A92CFD64-176B-4DC2-9BF2-257F4EBBE901@amacapital.net>
Date:   Sat, 8 Aug 2020 12:49:30 -0700
From:   Andy Lutomirski <luto@...capital.net>
To:     George Spelvin <lkml@....org>
Cc:     netdev@...r.kernel.org, w@....eu, aksecurity@...il.com,
        torvalds@...ux-foundation.org, edumazet@...gle.com,
        Jason@...c4.com, luto@...nel.org, keescook@...omium.org,
        tglx@...utronix.de, peterz@...radead.org, tytso@....edu,
        lkml.mplumb@...il.com, stephen@...workplumber.org
Subject: Re: Flaw in "random32: update the net random state on interrupt and activity"

> On Aug 8, 2020, at 12:03 PM, George Spelvin <lkml@....org> wrote:
> 
> On Sat, Aug 08, 2020 at 10:07:51AM -0700, Andy Lutomirski wrote:
>>>   - Cryptographically strong ChaCha, batched
>>>   - Cryptographically strong ChaCha, with anti-backtracking.
>> 
>> I think we should just anti-backtrack everything.  With the "fast key 
>> erasure" construction, already implemented in my patchset for the 
>> buffered bytes, this is extremely fast.
> 
> The problem is that this is really *amorized* key erasure, and
> requires large buffers to amortize the cost down to a reasonable
> level.
> 
> E,g, if using 256-bit (32-byte) keys, 5% overhead would require generating
> 640 bytes at a time.
> 
> Are we okay with ~1K per core for this?  Which we might have to
> throw away occasionally to incorporate fresh seed material?

I don’t care about throwing this stuff away. My plan (not quite implemented yet) is to have a percpu RNG stream and to never to anything resembling mixing anything in. The stream is periodically discarded and reinitialized from the global “primary” pool instead.  The primary pool has a global lock. We do some vaguely clever trickery to arrange for all the percpu pools to reseed from the primary pool at different times.

Meanwhile the primary pool gets reseeded by the input pool on a schedule for catastrophic reseeding.

5% overhead to make a fresh ChaCha20 key each time sounds totally fine to me. The real issue is that the bigger we make this thing, the bigger the latency spike each time we run it.

Do we really need 256 bits of key erasure?  I suppose if we only replace half the key each time, we’re just asking for some cryptographer to run the numbers on a break-one-of-many attack and come up with something vaguely alarming.

I wonder if we get good performance by spreading out the work. We could, for example, have a 320 byte output buffer that get_random_bytes() uses and a 320+32 byte “next” buffer that is generated as the output buffer is used. When we finish the output buffer, the first 320 bytes of the next buffer becomes the current buffer and the extra 32 bytes becomes the new key (or nonce).  This will have lower worst case latency, but it will hit the cache lines more often, potentially hurting throughout.