linux-kernel - Re: [PATCH v3] x86: Use `get_random_u8' for kernel stack offset randomization

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.21.2306051609430.36323@angie.orcam.me.uk>
Date:   Mon, 5 Jun 2023 16:45:16 +0100 (BST)
From:   "Maciej W. Rozycki" <macro@...am.me.uk>
To:     Thomas Gleixner <tglx@...utronix.de>,
        "Jason A. Donenfeld" <Jason@...c4.com>
cc:     Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        "H. Peter Anvin" <hpa@...or.com>,
        Kees Cook <keescook@...omium.org>, x86@...nel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3] x86: Use `get_random_u8' for kernel stack offset
 randomization

On Wed, 22 Feb 2023, Maciej W. Rozycki wrote:

> > > > Please provide numbers on contemporary hardware.
> > >
> > >  Jason, is this something you could help me with to back up your claim?
> > >
> > >  My access to modern x86 gear is limited and I just don't have anything I
> > > can randomly fiddle with (I guess an Intel Core 2 Duo T5600 processor back
> > > from 2008 doesn't count as "contemporary", does it?).
> > 
> > I imagine tglx wants real life performance numbers rather than a
> > microbench of the rng. So the thing to do would be to exercise
> > arch_exit_to_user_mode() a bunch. Does this trigger on every syscall,
> > even invalid ones? If so, you could make a test like:
> > 
> >     #include <sys/syscall.h>
> >     #include <unistd.h>
> > 
> >     int main(int argc, char *argv[])
> >     {
> >      for (int i = 0; i < (1 << 26); ++i)
> >       syscall(0xffffffff);
> >      return 0;
> >     }
> > 
> > And then see if the timing changes across your patch.
> 
>  Thanks.  Though that does not solve my lack of suitable hardware, sigh.  
> It's not like I have x86 systems scattered all over the place.  I guess I 
> could try to benchmark with said T5600 piece, but it won't be until April 
> the earliest as I'm away most of the time.

 Thank you for waiting.  I was able to arrange for benchmarking now with 
an "Intel(R) Core(TM)2 CPU T5600 @ 1.83GHz" piece.  I did a minor research 
and chose to use `perf bench syscall all' to evaluate the change, as this 
software is readily available and bundled with Linux even.  Results are as 
follows:

1. Randomisation configured in, but disabled:

# Running syscall/basic benchmark...
# Executed 10000000 getppid() calls
     Total time: 4.601 [sec]

       0.460165 usecs/op
        2173132 ops/sec

# Running syscall/getpgid benchmark...
# Executed 10000000 getpgid() calls
     Total time: 3.241 [sec]

       0.324109 usecs/op
        3085383 ops/sec

# Running syscall/execve benchmark...
# Executed 10000 execve() calls
     Total time: 7.041 [sec]

     704.193800 usecs/op
           1420 ops/sec

2. Randomisation enabled, using RDTSC:

# Running syscall/basic benchmark...
# Executed 10000000 getppid() calls
     Total time: 4.995 [sec]

       0.499529 usecs/op
        2001886 ops/sec

# Running syscall/getpgid benchmark...
# Executed 10000000 getpgid() calls
     Total time: 3.625 [sec]

       0.362521 usecs/op
        2758460 ops/sec

# Running syscall/execve benchmark...
# Executed 10000 execve() calls
     Total time: 7.009 [sec]

     700.990800 usecs/op
           1426 ops/sec

3. Randomisation enabled, using `get_random_u8':

# Running syscall/basic benchmark...
# Executed 10000000 getppid() calls
     Total time: 6.053 [sec]

       0.605394 usecs/op
        1651817 ops/sec

# Running syscall/getpgid benchmark...
# Executed 10000000 getpgid() calls
     Total time: 4.641 [sec]

       0.464124 usecs/op
        2154598 ops/sec

# Running syscall/execve benchmark...
# Executed 10000 execve() calls
     Total time: 7.023 [sec]

     702.355400 usecs/op
           1423 ops/sec

There is some variance between runs, but the trend is stable.  NB this has 
been obtained with 6.3.0 (both Linux and `perf') and GCC 11.

 So enabling randomisation with RDTSC and with `get_random_u8' makes fast 
syscalls respectively 8% and 24% slower.  I think it has been expected 
that a call to `get_random_u8' will be slower than RDTSC.  But can we 
accept the slowdown given the security concerns about RDTSC?

 What are the next steps then?

  Maciej