[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <2236FBA76BA1254E88B949DDB74E612BA4C203E8@IRSMSX102.ger.corp.intel.com>
Date: Fri, 29 Mar 2019 07:50:05 +0000
From: "Reshetova, Elena" <elena.reshetova@...el.com>
To: 'Kees Cook' <keescook@...omium.org>,
Andy Lutomirski <luto@...capital.net>
CC: Andy Lutomirski <luto@...nel.org>,
Josh Poimboeuf <jpoimboe@...hat.com>,
Jann Horn <jannh@...gle.com>,
"Perla, Enrico" <enrico.perla@...el.com>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
"Thomas Gleixner" <tglx@...utronix.de>,
LKML <linux-kernel@...r.kernel.org>,
"Peter Zijlstra" <peterz@...radead.org>,
Greg KH <gregkh@...uxfoundation.org>
Subject: RE: [RFC PATCH] x86/entry/64: randomize kernel stack offset upon
syscall
> On Thu, Mar 28, 2019 at 9:29 AM Andy Lutomirski <luto@...capital.net> wrote:
> > Doesn’t this just leak some of the canary to user code through side channels?
>
> Erf, yes, good point. Let's just use prandom and be done with it.
And here I have some numbers on this. Actually prandom turned out to be pretty
fast, even when called every syscall. See the numbers below:
1) lmbench: ./lat_syscall -N 1000000 null
base: Simple syscall: 0.1774 microseconds
random_offset (prandom_u32() every syscall): Simple syscall: 0.1822 microseconds
random_offset (prandom_u32() every 4th syscall): Simple syscall: 0.1844 microseconds
2) Andy's tests, misc-tests: ./timing_test_64 10M sys_enosys
base: 10000000 loops in 1.62224s = 162.22 nsec / loop
random_offset (prandom_u32() every syscall): 10000000 loops in 1.64660s = 166.26 nsec / loop
random_offset (prandom_u32() every 4th syscall): 10000000 loops in 3.51315s = 169.30 nsec / loop
The second case is when prandom is called only once in 4 syscalls and unused random
bits are preserved in a per-cpu buffer. As you can see it is actually slower (modulo my maybe not
so optimized code in prandom, see below) vs. calling it every time, so I would vote for actually calling it every time and saving
on the hassle and also avoid additional code in prandom.
And below is what I was calling instead of prandom_u32() to preserve random bits
(net_rand_state_buffer is a new per-cpu buffer I added to save random bits):
And I didn't include the check for bytes >= sizeof(u32) since this was
just poc to test the base speed, but for generic case it would be needed.
+void prandom_bytes_preserve(void *buf, size_t bytes)
+{
+ u32 *buffer = &get_cpu_var(net_rand_state_buffer);
+ u8 *ptr = buf;
+
+ if (!(*buffer)) {
+ struct rnd_state *state = &get_cpu_var(net_rand_state);
+ if (bytes > 0) {
+ *buffer = prandom_u32_state(state);
+ do {
+ *ptr++ = (u8) *buffer;
+ bytes--;
+ *buffer >>= BITS_PER_BYTE;
+ } while (bytes > 0);
+ }
+ put_cpu_var(net_rand_state);
+ put_cpu_var(net_rand_state_buffer);
+ } else {
+ if (bytes > 0) {
+ do {
+ *ptr++ = (u8) *buffer;
+ bytes--;
+ *buffer >>= BITS_PER_BYTE;
+ } while (bytes > 0);
+ }
+ put_cpu_var(net_rand_state_buffer);
+ }
+}
I will send the first version of patch (calling prandom_u32() every time)
shortly if anyone wants to double check performance implications.
Best Regards,
Elena.
Powered by blists - more mailing lists