lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sun, 9 Aug 2020 18:30:17 +0000
From:   George Spelvin <lkml@....ORG>
To:     Willy Tarreau <w@....eu>
Cc:     netdev@...r.kernel.org, aksecurity@...il.com,
        torvalds@...ux-foundation.org, edumazet@...gle.com,
        Jason@...c4.com, luto@...nel.org, keescook@...omium.org,
        tglx@...utronix.de, peterz@...radead.org, tytso@....edu,
        lkml.mplumb@...il.com, stephen@...workplumber.org, fw@...len.de,
        George Spelvin <lkml@....ORG>
Subject: Re: [DRAFT PATCH] random32: make prandom_u32() output unpredictable

On Sun, Aug 09, 2020 at 07:33:03PM +0200, Willy Tarreau wrote:
> Not that low in fact because they don't know precisely when the call is
> made. I mean, let's say we're in the worst case, with two VMs running on
> two siblings of the same core, with the same TSC, on a 3 GHz machine. The
> attacker can stress the victim at 100k probes per second. That's still
> 15 bits of uncertainty on the TSC value estimation which is added to each
> call. Even on the first call this is enough to make a source port
> unguessable, and preventing the attacker from staying synchronized with
> its victim. And I'm only speaking about an idle remote machine, not even
> one taking unobservable traffic, which further adds to the difficulty.

I'm trying to understand your attack scenario.  I'm assuming that an
attacker can call prandom_u32() locally.  (I don't have a specific code
path, but given the number of uses in the kernel, I assume *one* of
them will leak the output directly.)  And repeat the call fast
enough that there's at most *one* other user between our calls.

If an attacker knows the initial state, does an rdtsc, prandom_u32(),
and a second rdtsc, then they can guess the TSC value used in than
prandom_u32() quite accurately (4-6 bits fuzz, perhaps).  This is
trivial to brute force.

The fun comes if someone else does a prandom_u32() call in between.

All of a sudden, the 4-6 bit brute force of one get_cycles() value
fails to find a solution.  Someone else has called prandom_u32()!

Now we have 15 bits of uncertainty about that other call, and 5 bits
of uncertainty about our call.  2^20 possibilities only takes a few
milliseconds to test, and the 32-bit output of prandom_u32() can
verify a guess with minimal probability of error.

(Note that, to maintain tracking, we have to keep hammering
prandom_u32() *during* the search, but we can just buffer the results
and process them after the expensive search is complete.)

What you can see here is the incredible power of *multiple* unobserved
seedings.  As long as an attacker can limit things to one unobserved
prandom_u32(), it's a simple brute force.

If there are mroe than one, the additional bits of uncertainty
quickly make things impractical.

This is why I'm so keen on less frequent, more catastrophic,
reseeding.  Yes, the delay means an attacker who has captured
the state retains full knowledge for longer.  But they get
kicked off as soon as the catastophe happens.  Without it, they
can keep tracking the state indefinitely.

Even something simple like buffering 8 TSC samples, and adding them
at 32-bit offsets across the state every 8th call, would make a huge
difference.

Even if 7 of the timestamps are from attacker calls (4 bits
uncertainty), the time of the target call is 8x less known
(so it goes from 15 to 18 bits), and the total becomes
46 bits.  *So* much better.

> I can run some tests on this as
> well. I'd really need to try on a cleaner setup, I have remote machines
> at the office but I don't feel safe enough to remotely reboot them and
> risk to lose them :-/

Yeah, test kernels are nervous-making that way.

> I'll also test on arm/arm64 to make sure we don't introduce a significant
> cost there.

I don't expect a problem, but SipHash is optimized for 4-issue processors,
and on narrower machines fewer instructions are "free".

Powered by blists - more mailing lists