lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 7 Aug 2020 13:16:20 -0700
From:   Andy Lutomirski <luto@...capital.net>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Willy Tarreau <w@....eu>, Marc Plumb <lkml.mplumb@...il.com>,
        Theodore Ts'o <tytso@....edu>, Netdev <netdev@...r.kernel.org>,
        Amit Klein <aksecurity@...il.com>,
        Eric Dumazet <edumazet@...gle.com>,
        "Jason A. Donenfeld" <Jason@...c4.com>,
        Andrew Lutomirski <luto@...nel.org>,
        Kees Cook <keescook@...omium.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        stable <stable@...r.kernel.org>
Subject: Re: Flaw in "random32: update the net random state on interrupt and activity"



> On Aug 7, 2020, at 12:57 PM, Linus Torvalds <torvalds@...ux-foundation.org> wrote:
> 
> On Fri, Aug 7, 2020 at 12:33 PM Andy Lutomirski <luto@...capital.net> wrote:
>> 
>> No one said we have to do only one ChaCha20 block per slow path hit.
> 
> Sure, doing more might be better for amortizing the cost.
> 
> But you have to be very careful about latency spikes. I would be
> *really* nervous about doing a whole page at a time, when this is
> called from routines that literally expect it to be less than 50
> cycles.
> 
> So I would seriously suggest you look at a much smaller buffer. Maybe
> not a single block, but definitely not multiple kB either.
> 
> Maybe something like 2 cachelines might be ok, but there's a reason
> the current code only works with 16 bytes (or whatever) and only does
> simple operations with no looping.
> 
> That's why I think you might look at a single double-round ChaCha20
> instead. Maybe do it for two blocks - by the time you wrap around,
> you'll have done more than a full ChaCaa20.
> 
> That would imnsho *much* better than doing some big block, and have
> huge latency spikes and flush a large portion of your L1 when they
> happen. Nasty nasty behavior.
> 
> I really think the whole "we can amortize it with bigger blocks" is
> complete and utter garbage. It's classic "benchmarketing" crap.
> 

I think this will come down to actual measurements :). If the cost of one block of cache-cold ChaCha20 is 100 cycles of actual computation and 200 cycles of various cache misses, then let’s do more than one block.

I’ll get something working and we’ll see. 

Powered by blists - more mailing lists