lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 7 Aug 2020 12:56:57 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Andy Lutomirski <luto@...capital.net>
Cc:     Willy Tarreau <w@....eu>, Marc Plumb <lkml.mplumb@...il.com>,
        "Theodore Ts'o" <tytso@....edu>, Netdev <netdev@...r.kernel.org>,
        Amit Klein <aksecurity@...il.com>,
        Eric Dumazet <edumazet@...gle.com>,
        "Jason A. Donenfeld" <Jason@...c4.com>,
        Andrew Lutomirski <luto@...nel.org>,
        Kees Cook <keescook@...omium.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        stable <stable@...r.kernel.org>
Subject: Re: Flaw in "random32: update the net random state on interrupt and activity"

On Fri, Aug 7, 2020 at 12:33 PM Andy Lutomirski <luto@...capital.net> wrote:
>
> No one said we have to do only one ChaCha20 block per slow path hit.

Sure, doing more might be better for amortizing the cost.

But you have to be very careful about latency spikes. I would be
*really* nervous about doing a whole page at a time, when this is
called from routines that literally expect it to be less than 50
cycles.

So I would seriously suggest you look at a much smaller buffer. Maybe
not a single block, but definitely not multiple kB either.

Maybe something like 2 cachelines might be ok, but there's a reason
the current code only works with 16 bytes (or whatever) and only does
simple operations with no looping.

That's why I think you might look at a single double-round ChaCha20
instead. Maybe do it for two blocks - by the time you wrap around,
you'll have done more than a full ChaCaa20.

That would imnsho *much* better than doing some big block, and have
huge latency spikes and flush a large portion of your L1 when they
happen. Nasty nasty behavior.

I really think the whole "we can amortize it with bigger blocks" is
complete and utter garbage. It's classic "benchmarketing" crap.

             Linus

Powered by blists - more mailing lists