linux-kernel - RE: [RFC PATCH] x86/entry/64: randomize kernel stack offset upon syscall

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <2236FBA76BA1254E88B949DDB74E612BA4C203E8@IRSMSX102.ger.corp.intel.com>
Date:   Fri, 29 Mar 2019 07:50:05 +0000
From:   "Reshetova, Elena" <elena.reshetova@...el.com>
To:     'Kees Cook' <keescook@...omium.org>,
        Andy Lutomirski <luto@...capital.net>
CC:     Andy Lutomirski <luto@...nel.org>,
        Josh Poimboeuf <jpoimboe@...hat.com>,
        Jann Horn <jannh@...gle.com>,
        "Perla, Enrico" <enrico.perla@...el.com>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        "Thomas Gleixner" <tglx@...utronix.de>,
        LKML <linux-kernel@...r.kernel.org>,
        "Peter Zijlstra" <peterz@...radead.org>,
        Greg KH <gregkh@...uxfoundation.org>
Subject: RE: [RFC PATCH] x86/entry/64: randomize kernel stack offset upon
 syscall

> On Thu, Mar 28, 2019 at 9:29 AM Andy Lutomirski <luto@...capital.net> wrote:
> > Doesn’t this just leak some of the canary to user code through side channels?
> 
> Erf, yes, good point. Let's just use prandom and be done with it.

And here I have some numbers on this. Actually prandom turned out to be pretty
fast, even when called every syscall. See the numbers below:

1) lmbench: ./lat_syscall -N 1000000 null
    base:                                              Simple syscall: 0.1774 microseconds
    random_offset (prandom_u32() every syscall):     Simple syscall: 0.1822 microseconds
    random_offset (prandom_u32() every 4th syscall): Simple syscall: 0.1844 microseconds

2)  Andy's tests, misc-tests: ./timing_test_64 10M sys_enosys
    base:                                              10000000 loops in 1.62224s = 162.22 nsec / loop
    random_offset (prandom_u32() every syscall):     10000000 loops in 1.64660s = 166.26 nsec / loop
    random_offset (prandom_u32() every 4th syscall): 10000000 loops in 3.51315s = 169.30 nsec / loop

The second case is when prandom is called only once in 4 syscalls and unused random
bits are preserved in a per-cpu buffer. As you can see it is actually slower (modulo my maybe not
so optimized code in prandom, see below) vs. calling it every time, so I would vote for actually calling it every time and saving
on the hassle and also avoid additional code in prandom.

And below is what I was calling instead of prandom_u32() to preserve random bits
(net_rand_state_buffer is a new per-cpu buffer I added to save random bits):
And I didn't include the check for bytes >= sizeof(u32) since this was 
just poc to test the base speed, but for generic case it would be needed.

+void prandom_bytes_preserve(void *buf, size_t bytes)
+{
+    u32 *buffer = &get_cpu_var(net_rand_state_buffer);
+    u8 *ptr = buf;
+
+    if (!(*buffer)) {
+        struct rnd_state *state = &get_cpu_var(net_rand_state);
+        if (bytes > 0) {
+            *buffer = prandom_u32_state(state);
+            do {
+                *ptr++ = (u8) *buffer;
+                bytes--;
+                *buffer >>= BITS_PER_BYTE;
+            } while (bytes > 0);
+        }
+        put_cpu_var(net_rand_state);
+        put_cpu_var(net_rand_state_buffer);
+    } else {
+        if (bytes > 0) {
+            do {
+                *ptr++ = (u8) *buffer;
+                bytes--;
+                *buffer >>= BITS_PER_BYTE;
+            } while (bytes > 0);
+        }
+        put_cpu_var(net_rand_state_buffer);
+    }
+}

I will send the first version of patch (calling prandom_u32() every time)
shortly if anyone wants to double check performance implications. 

Best Regards,
Elena.