lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 7 Jun 2022 16:19:26 +0200
From:   Ard Biesheuvel <ardb@...nel.org>
To:     "Jason A. Donenfeld" <Jason@...c4.com>,
        Catalin Marinas <catalin.marinas@....com>
Cc:     linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
        Stephen Boyd <swboyd@...omium.org>,
        Russell King <linux@...linux.org.uk>,
        Arnd Bergmann <arnd@...db.de>,
        Phil Elwell <phil@...pberrypi.com>
Subject: Re: [PATCH] random: defer use of bootloader randomness to random_init()

On Tue, 7 Jun 2022 at 14:22, Jason A. Donenfeld <Jason@...c4.com> wrote:
>
> Hi Ard,
>
> On Tue, Jun 07, 2022 at 02:15:27PM +0200, Ard Biesheuvel wrote:
> > Note that jump labels use asm() blocks, which are opaque to the
> > compiler, and so it is not guaranteed that codegen will be better than
>
> I actually spent a lot of time looking at the codegen on a few
> platforms.
>

I did a quick experiment on ThunderX2 with the following program

#include <stdio.h>
#include <stdlib.h>
#include <sys/random.h>

static unsigned char buf[16];

int main(void)
{
  for (int i = 0; i < 1000000; i++) {
    if (getrandom(buf, sizeof(buf),
        GRND_RANDOM | GRND_NONBLOCK) < sizeof(buf)) {
          fprintf(stderr, "getrandom() error!\n");
          exit(-1);
    }
  }
  return 0;
}

both with and without your revert patch of f5bda35fba615ac applied
onto v5.19-rc1, the results are below (Cortex-A57 @ 2 GHz):

############## WITH STATIC BRANCH

ard@...food:~$ perf_5.10 stat ./rnd

 Performance counter stats for './rnd':

            906.01 msec task-clock                #    0.999 CPUs utilized
                 2      context-switches          #    0.002 K/sec
                 0      cpu-migrations            #    0.000 K/sec
                40      page-faults               #    0.044 K/sec
     1,812,010,034      cycles                    #    2.000 GHz
     1,944,549,733      instructions              #    1.07  insn per cycle
   <not supported>      branches
         2,014,408      branch-misses

       0.906695576 seconds time elapsed

       0.140334000 seconds user
       0.765871000 seconds sys


ard@...food:~$ perf_5.10 stat ./rnd

 Performance counter stats for './rnd':

            918.37 msec task-clock                #    0.999 CPUs utilized
                 2      context-switches          #    0.002 K/sec
                 0      cpu-migrations            #    0.000 K/sec
                37      page-faults               #    0.040 K/sec
     1,836,733,451      cycles                    #    2.000 GHz
     1,944,572,442      instructions              #    1.06  insn per cycle
   <not supported>      branches
         3,012,020      branch-misses

       0.919027080 seconds time elapsed

       0.159721000 seconds user
       0.758718000 seconds sys


ard@...food:~$ perf_5.10 stat ./rnd

 Performance counter stats for './rnd':

            902.06 msec task-clock                #    0.999 CPUs utilized
                 1      context-switches          #    0.001 K/sec
                 0      cpu-migrations            #    0.000 K/sec
                39      page-faults               #    0.043 K/sec
     1,804,103,600      cycles                    #    2.000 GHz
     1,944,563,889      instructions              #    1.08  insn per cycle
   <not supported>      branches
         1,956,027      branch-misses

       0.902793520 seconds time elapsed

       0.172464000 seconds user
       0.729822000 seconds sys

############## WITHOUT STATIC BRANCH

ard@...food:~$ perf_5.10 stat ./rnd

 Performance counter stats for './rnd':

            924.79 msec task-clock                #    0.999 CPUs utilized
                 2      context-switches          #    0.002 K/sec
                 0      cpu-migrations            #    0.000 K/sec
                39      page-faults               #    0.042 K/sec
     1,849,568,681      cycles                    #    2.000 GHz
     1,950,584,115      instructions              #    1.05  insn per cycle
   <not supported>      branches
         4,012,227      branch-misses

       0.925500832 seconds time elapsed

       0.204227000 seconds user
       0.720739000 seconds sys


ard@...food:~$ perf_5.10 stat ./rnd

 Performance counter stats for './rnd':

            933.06 msec task-clock                #    0.999 CPUs utilized
                 2      context-switches          #    0.002 K/sec
                 0      cpu-migrations            #    0.000 K/sec
                39      page-faults               #    0.042 K/sec
     1,866,097,571      cycles                    #    2.000 GHz
     1,950,574,944      instructions              #    1.05  insn per cycle
   <not supported>      branches
         3,984,008      branch-misses

       0.933798032 seconds time elapsed

       0.323067000 seconds user
       0.610271000 seconds sys


ard@...food:~$ perf_5.10 stat ./rnd

 Performance counter stats for './rnd':

            913.16 msec task-clock                #    0.999 CPUs utilized
                 1      context-switches          #    0.001 K/sec
                 0      cpu-migrations            #    0.000 K/sec
                37      page-faults               #    0.041 K/sec
     1,826,308,530      cycles                    #    2.000 GHz
     1,950,559,902      instructions              #    1.07  insn per cycle
   <not supported>      branches
         2,231,050      branch-misses

       0.913874672 seconds time elapsed

       0.164228000 seconds user
       0.749157000 seconds sys

So that's a 0.3% reduction (in terms of actual instructions executed)
in a synthetic benchmark that does nothing but call getrandom() in a
tight loop.

In other words, I think the static branch solves a problem that does not exist.

> > > > - Why do we need to enable this static key so early?
> > >
> > > We don't need to enable it especially early. I've now sent three
> > > different approaches for deferring it until later and you suggested one.
> > > The first of mine is kind of ugly (checking static_key_initialized and
> > > such at different points).  Your suggested one after that did the same
> > > but deferred into crng_reseed(), which I'm not a fan of. My second one
> > > is this patch, which is flawed for the reason you pointed out. But
> > > perhaps my third one is the right amount of simple and okay? That's the
> > > one I linked up top, [1]. Let me know what you think of that.
> > >
> > > My motivation for not wanting to defer it is that if the arch solution
> > > winds up being easy and straight forward (as it was for arm64), then it
> > > would be nice to not need to clutter up random.c as a result.
> >
> > If clutter is a concern, how about getting rid of the
> > execute_in_process_context() dance, and just use a late_initcall()
> > instead?
>
> As I already explained in [1], this does not work. If the order is
> (A)(B), then all this will happen *after* the late init call.
>
> [1] https://lore.kernel.org/lkml/Yp8oOH+9V336LrLk@zx2c4.com/
>

Yeah, I guess finding the right spot is tricky. The more reason just
to drop it altogether.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ