netdev - Re: Flaw in "random32: update the net random state on interrupt and activity"

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <940D743C-4FDD-43B5-A129-840CFEBBD2F7@amacapital.net>
Date:   Fri, 7 Aug 2020 12:08:27 -0700
From:   Andy Lutomirski <luto@...capital.net>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Willy Tarreau <w@....eu>, Marc Plumb <lkml.mplumb@...il.com>,
        Theodore Ts'o <tytso@....edu>, Netdev <netdev@...r.kernel.org>,
        Amit Klein <aksecurity@...il.com>,
        Eric Dumazet <edumazet@...gle.com>,
        "Jason A. Donenfeld" <Jason@...c4.com>,
        Andrew Lutomirski <luto@...nel.org>,
        Kees Cook <keescook@...omium.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        stable <stable@...r.kernel.org>
Subject: Re: Flaw in "random32: update the net random state on interrupt and activity"

> On Aug 7, 2020, at 11:10 AM, Linus Torvalds <torvalds@...ux-foundation.org> wrote:
> 
> On Fri, Aug 7, 2020 at 10:55 AM Andy Lutomirski <luto@...capital.net> wrote:
>> 
>> I think the real random.c can run plenty fast. It’s ChaCha20 plus ludicrous overhead right now.
> 
> I doubt it.
> 
> I tried something very much like that in user space to just see how
> many cycles it ended up being.
> 
> I made a "just raw ChaCha20", and it was already much too slow for
> what some of the networking people claim to want.

Do you remember the numbers?

Certainly a full ChaCha20 per random number is too much, but AFAICT the network folks want 16 or 32 bits at a time, which is 1/16 or 1/8 of a ChaCha20. DJB claims 4 cycles per byte on Core 2, and it had better be faster now, although we can’t usefully use XMM regs, so I don’t know the real timings.

But with the current code, the actual crypto will be lost in the noise.  That’s what I’m trying to fix.
> 
> Now, what *might* be acceptable is to not do ChaCha20, but simply do a
> single double-round of it.

We can certainly have a parallel RNG seeded by the main RNG that runs fewer rounds. I’ll do that if benchmarks say I’m still too slow.

All of this is trivial except the locking. If I’m writing this code, I personally refuse to use  the “races just make it more random” strategy. I’m going to do it without data races, and this will take a bit of work.