netdev - Re: Flaw in "random32: update the net random state on interrupt and activity"

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200808180835.GA7480@1wt.eu>
Date:   Sat, 8 Aug 2020 20:08:35 +0200
From:   Willy Tarreau <w@....eu>
To:     Andy Lutomirski <luto@...capital.net>
Cc:     George Spelvin <lkml@....org>, netdev@...r.kernel.org,
        aksecurity@...il.com, torvalds@...ux-foundation.org,
        edumazet@...gle.com, Jason@...c4.com, luto@...nel.org,
        keescook@...omium.org, tglx@...utronix.de, peterz@...radead.org,
        tytso@....edu, lkml.mplumb@...il.com, stephen@...workplumber.org
Subject: Re: Flaw in "random32: update the net random state on interrupt and
 activity"

On Sat, Aug 08, 2020 at 10:07:51AM -0700, Andy Lutomirski wrote:
> 
> > On Aug 8, 2020, at 8:29 AM, George Spelvin <lkml@....org> wrote:
> > 
> 
> > And apparently switching to the fastest secure PRNG currently
> > in the kernel (get_random_u32() using ChaCha + per-CPU buffers)
> > would cause too much performance penalty.
> 
> Can someone explain *why* the slow path latency is particularly relevant
> here?  What workload has the net code generating random numbers in a place
> where even a whole microsecond is a problem as long as the amortized cost is
> low?  (I'm not saying I won't believe this matters, but it's not obvious to
> me that it matters.)

It really depends on what is being done and how we want that code to
continue to be used. Often it will be a matter of queue sizes or
concurrency in certain contexts under a lock. For illustration let's
say you want to use randoms to choose a sequence number for a SYN
cookie (it's not what is done today), a 40G NIC can deliver 60Mpps
or one packet every 16 ns. If you periodically add 1us you end up
queuing 60 extra packets in a queue when this happens. This might
or might not be acceptable. Regarding concurrency, you could imagine
some code picking a random in a small function running under a
spinlock. Frequently adding 1us there can be expensive.

To be clear, I'm not *that* much worried about *some* extra latency,
however I'm extremely worried about the risks of increase once
some security researcher considers the PRNG not secure enough again
(especially once it starts to be used for security purposes after
having being claimed secure) and we replace it with another one
showing a higher cost and longer tail to amortize the cost. And given
that we're speaking about replacing a non-secure PRNG with *something*,
it doesn't seem really wise to start to introduce new constraints on
the data path when there are probably a large enough number of
possibilities which do not require these constraints.

Last point, we must not rule out the possibility than some clever
researcher will later come up with new time-based attacks consisting
in observing latencies of packet processing, explaining how they can
count the number of connections reaching a given host and reveal
certain useful information, and that we're then forced to slow down
all of them to proceed in constant time.

Just my two cents,
Willy