linux-kernel - RE: [PATCH] x86/entry/64: randomize kernel stack offset upon syscall

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2236FBA76BA1254E88B949DDB74E612BA4CA8DBF@IRSMSX102.ger.corp.intel.com>
Date:   Tue, 28 May 2019 12:28:53 +0000
From:   "Reshetova, Elena" <elena.reshetova@...el.com>
To:     Kees Cook <keescook@...omium.org>, Ingo Molnar <mingo@...nel.org>,
        "Andy Lutomirski" <luto@...nel.org>
CC:     David Laight <David.Laight@...lab.com>,
        Theodore Ts'o <tytso@....edu>,
        Eric Biggers <ebiggers3@...il.com>,
        "ebiggers@...gle.com" <ebiggers@...gle.com>,
        "herbert@...dor.apana.org.au" <herbert@...dor.apana.org.au>,
        Peter Zijlstra <peterz@...radead.org>,
        "Daniel Borkmann" <daniel@...earbox.net>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "jpoimboe@...hat.com" <jpoimboe@...hat.com>,
        "jannh@...gle.com" <jannh@...gle.com>,
        "Perla, Enrico" <enrico.perla@...el.com>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "bp@...en8.de" <bp@...en8.de>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
        "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: RE: [PATCH] x86/entry/64: randomize kernel stack offset upon syscall

> > With 5 bits there's a ~96.9% chance of crashing the system in an attempt,
> > the exploit cannot be used for a range of attacks, including spear
> > attacks and fast-spreading worms, right? A crashed and inaccessible
> > system also increases the odds of leaving around unfinished attack code
> > and leaking a zero-day attack.
> 
> Yup, which is why I'd like to have _something_ here without us getting
> lost in the "perfect entropy" weeds. :)

I really start to believe that we cannot make good randomness sources behave
fast enough for per-syscall usage if our target is 1-2% overhead under worst possible
(and potentially unrealistic ) scenario (stress test of some simple syscall like getpid()).
The only thing that would fit the margin is indeed rdtsc().

I profiled the path in use with get_random_bytes() and results look like this 
(arch_get_random_long in not inline for measurement purpose here):

> >
> >                |          |          |           --9.44%--random_get_byte
> >                |          |          |                     |
> >                |          |          |                      --8.08%--get_random_bytes
> >                |          |          |                                |
> >                |          |          |                                 --7.80%--_extract_crng.constprop.45
> >                |          |          |                                           |
> >                |          |          |                                           |--4.95%--arch_get_random_long
> >                |          |          |                                           |
> >                |          |          |                                            --2.39%--chacha_block


And here is the proof that under such usage _extract_crng bottlenecks on rdrand: 

PerfTop:    5877 irqs/sec  kernel:78.6%  exact: 100.0% [4000Hz cycles:ppp],  (all, 8 CPUs)
------------------------------------------------------------------------------------------------------------------------------------
Showing cycles:ppp for _extract_crng.constprop.46
  Events  Pcnt (>=5%)
 Percent | Source code & Disassembly of kcore for cycles:ppp (2104 samples, percent: local period)
-------------------------------------------------------------------------------------------------------
    0.00 :   ffffffff9abd1a62:       mov    $0xa,%edx
   97.94 :   ffffffff9abd1a67:       rdrand %rax

And then of course there is chacha permutation itself. So, I think Andy's proposal to rewrite 
"get_random_bytes" for speed is not so easy to implement. 

So, given that all we want is to raise the bar for attackers to predict the stack location
on subsequent syscall, is it really worth to try to come up with more complex solutions than
just using lower bits of rdtsc() by default? 

One idea that I got suggested last week is to create a pool of good randomness and
 then during syscall select a random number from the pool using smth rdtsc()%POOL_SIZE.
Pool would need to be refilled periodically, outside of syscall path to maintain diversity. 
I can try this approach, if people believe that it would address the security concerns around
rdtsc() (my personal feeling is that one can still time attack this if we assume that rdtsc
can be attacked and complexity of the whole thing increases considerably). 

If we decide that this is too much trouble for just 5 bits of randomness we need per syscall, I would 
still propose we reconsider original rdtsc() approach since it is still better than nothing. 
We can have the whole thing on three levels:

CONFIG_RANDOMIZE_KSTACK_OFFSET - off - no randomization, like now
CONFIG_RANDOMIZE_KSTACK_OFFSET on with rdtsc(), fast, better than nothing, but prone to 
timing attacks
CONFIG_RANDOMIZE_KSTACK_OFFSET based on get_random_bytes() with better security guarantees. 

Performance numbers for  will approx. look like

No randomization: 			Simple syscall: 0.0534 microseconds
With rdtsc():				Simple syscall: 0.0539 microseconds
Wih get_random_bytes(4096 buffer): 	Simple syscall: 0.0597 microseconds

Pure rdrand option with calling rdrand_long every 10th syscall is considerably slower

With rdrand (every 10th syscall):		Simple syscall: 0.0719 microseconds

And I guess we should once again remember that these are *not* the numbers that real
users will see in practice since I doubt we have the real loads issuing millions of *very
lightweight* syscalls in a loop, so this is really more "theoretical, worst case ever" numbers. 

If someone could actually propose a reasonable *practical* workload to measure with, 
then we can see what is the overhead on that both for rdtsc and get_random_bytes(). 

Best Regards,
Elena.