lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <m3my8bvrhl.fsf@lugabout.jhcloos.org>
Date:	Sat, 13 Jun 2009 15:54:38 -0400
From:	James Cloos <cloos@...loos.com>
To:	Alan Cox <alan@...rguk.ukuu.org.uk>
Cc:	linux-kernel@...r.kernel.org,
	"Linux-MIPS" <linux-mips@...ux-mips.org>,
	Florian Fainelli <florian@...nwrt.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Takashi Iwai <tiwai@...e.de>,
	Ralf Baechle <ralf@...ux-mips.org>
Subject: Re: [PATCH 1/8] add lib/gcd.c

>>>>> "|" == James Cloos <cloos@...loos.com> writes:
>>>>> "Alan" == Alan Cox <alan@...rguk.ukuu.org.uk> writes:

|> Would the binary gcd algorithm not be a better fit for the kernel?

Alan> Could well be the shift based one is better for some processors only.

|> Very likely, I suspect.

|> In any case, I do not have the hardware to do any statistically
|> significant testing;

I take that back.  Just in case speed is a relevant issue, I ran a test
on my MX, which is a small xen domU running on a:
,----
| EFamily: 0 EModel: 0 Family: 6 Model: 15 Stepping: 11
| CPU Model: Core 2 Quad 
| Processor name string: Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz
`----
I got, compiling with gcc-4.4 -march=native -O3:

binary
408.39user 0.05system 6:52.75elapsed 98%CPU

quick (the code in the kernel)
600.96user 0.16system 10:19.06elapsed 97%CPU

contfrac (the typical euclid algo)
569.19user 0.12system 9:35.50elapsed 98%CPU

extended euclid (calculates g=ia+jb=gcd(a,b))
684.53user 0.13system 11:32.77elapsed 98%CPU

I also tried on an old Alpha at freeshell; it had gcc-3.3; gcc's -S
output looks like it uses hardware div there, just like it does on
x86 and amd64.  The bgcd, though, was 10-16 times faster than either
version of euclid's algo.

On my laptop's P3M, binary gcd was about twice as fast as euclid.

So, although modern processors are *much* better at int div, the
binary gcd algo is still faster.

The timings on the alpha and the laptop were of:

    for (a=0xFFF; a > 0; a--)
        for (b=a; b > 0; b--)
            g=gcd(a,b);

For the core2 times quoted above, I started with a=0xFFFF.

And I forgot to mention:  the bgcd code I posted was based on
some old notes of mine which most likely trace to TAoCP.

-JimC
-- 
James Cloos <cloos@...loos.com>         OpenPGP: 1024D/ED7DAEA6
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ