lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAD=FV=VG-BzzEJ2jn6hAYjre+BtOu-uyi4OQst=Lg9QQqAtKNw@mail.gmail.com>
Date:   Tue, 22 Sep 2020 17:39:27 -0700
From:   Doug Anderson <dianders@...omium.org>
To:     Ard Biesheuvel <ardb@...nel.org>
Cc:     David Laight <David.Laight@...lab.com>,
        Catalin Marinas <catalin.marinas@....com>,
        Will Deacon <will@...nel.org>,
        Jackie Liu <liuyun01@...inos.cn>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-arm-kernel@...ts.infradead.org" 
        <linux-arm-kernel@...ts.infradead.org>,
        Ard Biesheuvel <ard.biesheuvel@...aro.org>
Subject: Re: [PATCH] arm64: crypto: Add an option to assume NEON XOR is the fastest

Hi,

On Tue, Sep 22, 2020 at 3:30 AM Ard Biesheuvel <ardb@...nel.org> wrote:
>
> On Tue, 22 Sep 2020 at 10:26, David Laight <David.Laight@...lab.com> wrote:
> >
> > From: Douglas Anderson
> > > Sent: 22 September 2020 01:26
> > >
> > > On every boot time we see messages like this:
> > >
> > > [    0.025360] calling  calibrate_xor_blocks+0x0/0x134 @ 1
> > > [    0.025363] xor: measuring software checksum speed
> > > [    0.035351]    8regs     :  3952.000 MB/sec
> > > [    0.045384]    32regs    :  4860.000 MB/sec
> > > [    0.055418]    arm64_neon:  5900.000 MB/sec
> > > [    0.055423] xor: using function: arm64_neon (5900.000 MB/sec)
> > > [    0.055433] initcall calibrate_xor_blocks+0x0/0x134 returned 0 after 29296 usecs
> > >
> > > As you can see, we spend 30 ms on every boot re-confirming that, yet
> > > again, the arm64_neon implementation is the fastest way to do XOR.
> > > ...and the above is on a system with HZ=1000.  Due to the way the
> > > testing happens, if we have HZ defined to something slower it'll take
> > > much longer.  HZ=100 means we spend 300 ms on every boot re-confirming
> > > a fact that will be the same for every bootup.
> >
> > Can't the code use a TSC (or similar high-res counter) to
> > see how long it takes to process a short 'hot cache' block?
> > That wouldn't take long at all.
> >
>
> This is generic code that runs from an core_initcall() so I am not
> sure we can easily implement this in a portable way.

If it ran later, presumably you could just use ktime?  That seems like
it'd be a portable enough way?


> Doug: would it help if we deferred this until late_initcall()? We
> could take an arbitrary pick from the list at core_initcall() time to
> serve early users, and update to the fastest one at a later time.

Yeah, I think that'd work OK.  One advantage of it being later would
be that it could run in parallel to other things that were happening
in the system (anyone who enabled async probe on their driver).  Even
better would be if your code itself could run async and not block the
rest of boot.  ;-)  I do like the idea that we could just arbitrarily
pick one implementation until we've calibrated.  I guess we'd want to
figure out how to do this lockless but it shouldn't be too hard to
just check to see if a single pointer is non-NULL and once it becomes
non-NULL then you can use it...  ...or a pointer plus a sentinel if
writing the pointer can't be done atomically...

It also feels like with the large number of big.LITTLE systems out
there you'd either want a lookup table per core or you'd want to do
calibration per core.

-Doug

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ