linux-kernel - Re: [PATCH] lib/int_sqrt.c: Optimize square root function

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CA+55aFyBxxzNHH_z2BDNP5kmupSMa07wKK+6j=aURHN-tbMSQg@mail.gmail.com>
Date:   Thu, 20 Jul 2017 11:31:36 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Anshul Garg <aksgarg1989@...il.com>,
        Davidlohr Bueso <dave@...olabs.net>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        "anshul.g@...sung.com" <anshul.g@...sung.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Joe Perches <joe@...ches.com>
Subject: Re: [PATCH] lib/int_sqrt.c: Optimize square root function

How did this two-year old thread get resurrected?

Anyway, it got resurrected without even answering one core question:

On Thu, Jul 20, 2017 at 4:24 AM, Peter Zijlstra <peterz@...radead.org> wrote:
> On Mon, Feb 02, 2015 at 11:13:44AM -0800, Linus Torvalds wrote:
>>>> On Mon, Feb 2, 2015 at 11:00 AM, Linus Torvalds <torvalds@...ux-foundation.org> wrote:
>> >
>> > (I'm also not entirely sure what uses int_sqrt() that ends up being so
>> > performance-critical, so it would be good to document that too, since
>> > that probably also matters for the "what's the normal argument range"
>> > question..)

This is still the case. Which of the (very few) users really _care_?
And what are the normal values for that?

For example, the 802.11 minstrel code does a "MINSTREL_TRUNC()" on a
"unsigned int" value that is in millions. It's not even "unsigned
long", so we know it's not many thousands of millions, and
MINSTREL_TRUNC shifts it down by 12 bits.

So we know we have at most a 20-bit argument.

The one case that uses actual unsigned long seems to be
"slow_is_prime_number()", and honestly, the sqrt() is the *least* of
our problems there.

There's a few drivers and filesystems that use it. I do not believe
performance matters in those cases. Even if you do a "int_sqrt()" per
nertwork packet on some high-performance network (and none of them
look anything like that).

And there's a couple of VM users. They don't look particularly critical either.

So why do you care? Because honestly, calling int_sqrt() once in a
blue moon with caches cold and no branch prediction information tends
to have very different performance characteristics from calling it in
a loop with very predictable input.

So I think your "benchmark" is just garbage, in that it's testing
something entirely different than the actual load.

Also, since this is a generic library routine, no way can we depend on
fls being fast.

But we could certainly improve on the initial value a lot. It's just
that we should probably strive to improve on it without adding extra
branch misprediction or I$ misses - both things that your benchmark
isn't actually testing at all, since it does the exact opposite of
that by basically preloading both.

And the *most* important question is that first one:

 "Why does this matter, and what is the range it matters for?"

                  Linus