lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 20 Jul 2017 16:24:32 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Anshul Garg <aksgarg1989@...il.com>,
        Davidlohr Bueso <dave@...olabs.net>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        "anshul.g@...sung.com" <anshul.g@...sung.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Joe Perches <joe@...ches.com>
Subject: Re: [PATCH] lib/int_sqrt.c: Optimize square root function

On Thu, Jul 20, 2017 at 3:34 PM, Peter Zijlstra <peterz@...radead.org> wrote:
>>
>>  "Why does this matter, and what is the range it matters for?"
>
> I was looking to do some work on the idle estimator. Parts of that keep
> online avg and variance for normal distributions. I wanted to bias the
> avg downwards, the way to do that is to subtract a scaled stdev from it.
> Computing the stdev from a variance requires the sqrt.
>
> In any case, I suppose the range of values would be in the order of
> TICK_NSEC, so the variance would be a number of orders below that. So
> we're looking at fairly small numbers <1e5.

Ok. So that already cuts down the problem space a lot.

And yes, the current code is very suboptimal for exactly the small number case.

It's also typiocally the case that right-shifting is more expensive
than left-shifting, so the fact that we left-shift twice in that loop
for all the big values is extra expensive.

So I would actually suggest a different approach:

 - start with a small 'm'

 - and grow it up.

The "small m" means that there is a small constant, which is good.

And growing it up is a single trivial left shift by two, which can
also be done just two adds or as a single lea, so it works fine even
on architectures with slow shifters etc.

So mayube something like the attached?

NOTE NOTE NOTE! This may be pure garbage. It's untested. But the code
generation looks ok even if gcc is being way too clever and turns that
first loop into a counted loop instead. Whatever, maybe it's the right
thing to do.

But note that I might have broken the sqrt for some case, so you need
to double-check my logic there. The *intention* is that it's the exact
same thing as it used to do, just initializing 'm' differently.

                 Linus

View attachment "patch.diff" of type "text/plain" (745 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ