netdev - Re: Flaw in "random32: update the net random state on interrupt and activity"

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHk-=whPgKZRfK_Kfo6Oo+Aek-Z_U_Dxv9Y3HuNuHb5t=jLbcA@mail.gmail.com>
Date:   Fri, 7 Aug 2020 12:56:57 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Andy Lutomirski <luto@...capital.net>
Cc:     Willy Tarreau <w@....eu>, Marc Plumb <lkml.mplumb@...il.com>,
        "Theodore Ts'o" <tytso@....edu>, Netdev <netdev@...r.kernel.org>,
        Amit Klein <aksecurity@...il.com>,
        Eric Dumazet <edumazet@...gle.com>,
        "Jason A. Donenfeld" <Jason@...c4.com>,
        Andrew Lutomirski <luto@...nel.org>,
        Kees Cook <keescook@...omium.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        stable <stable@...r.kernel.org>
Subject: Re: Flaw in "random32: update the net random state on interrupt and activity"

On Fri, Aug 7, 2020 at 12:33 PM Andy Lutomirski <luto@...capital.net> wrote:
>
> No one said we have to do only one ChaCha20 block per slow path hit.

Sure, doing more might be better for amortizing the cost.

But you have to be very careful about latency spikes. I would be
*really* nervous about doing a whole page at a time, when this is
called from routines that literally expect it to be less than 50
cycles.

So I would seriously suggest you look at a much smaller buffer. Maybe
not a single block, but definitely not multiple kB either.

Maybe something like 2 cachelines might be ok, but there's a reason
the current code only works with 16 bytes (or whatever) and only does
simple operations with no looping.

That's why I think you might look at a single double-round ChaCha20
instead. Maybe do it for two blocks - by the time you wrap around,
you'll have done more than a full ChaCaa20.

That would imnsho *much* better than doing some big block, and have
huge latency spikes and flush a large portion of your L1 when they
happen. Nasty nasty behavior.

I really think the whole "we can amortize it with bigger blocks" is
complete and utter garbage. It's classic "benchmarketing" crap.

             Linus