lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 7 Aug 2020 12:33:33 -0700
From:   Andy Lutomirski <luto@...capital.net>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Willy Tarreau <w@....eu>, Marc Plumb <lkml.mplumb@...il.com>,
        Theodore Ts'o <tytso@....edu>, Netdev <netdev@...r.kernel.org>,
        Amit Klein <aksecurity@...il.com>,
        Eric Dumazet <edumazet@...gle.com>,
        "Jason A. Donenfeld" <Jason@...c4.com>,
        Andrew Lutomirski <luto@...nel.org>,
        Kees Cook <keescook@...omium.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        stable <stable@...r.kernel.org>
Subject: Re: Flaw in "random32: update the net random state on interrupt and activity"



> On Aug 7, 2020, at 12:21 PM, Linus Torvalds <torvalds@...ux-foundation.org> wrote:
> 
> On Fri, Aug 7, 2020 at 12:08 PM Andy Lutomirski <luto@...capital.net> wrote:
>> 4 cycles per byte on Core 2
> 
> I took the reference C implementation as-is, and just compiled it with
> O2, so my numbers may not be what some heavily optimized case does.
> 
> But it was way more than that, even when amortizing for "only need to
> do it every 8 cases". I think the 4 cycles/byte might be some "zero
> branch mispredicts" case when you've fully unrolled the thing, but
> then you'll be taking I$ misses out of the wazoo, since by definition
> this won't be in your L1 I$ at all (only called every 8 times).
> 
> Sure, it might look ok on microbenchmarks where it does stay hot the
> cache all the time, but that's not realistic. I

No one said we have to do only one ChaCha20 block per slow path hit.  In fact, the more we reduce the number of rounds, the more time we spend on I$ misses, branch mispredictions, etc, so reducing rounds may be barking up the wrong tree entirely.  We probably don’t want to have more than one page 

I wonder if AES-NI adds any value here.  AES-CTR is almost a drop-in replacement for ChaCha20, and maybe the performance for a cache-cold short run is better.

Powered by blists - more mailing lists