lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4672F5DF.7070506@zytor.com>
Date:	Fri, 15 Jun 2007 13:26:07 -0700
From:	"H. Peter Anvin" <hpa@...or.com>
To:	"Robert P. J. Day" <rpjday@...dspring.com>
CC:	Vegard Nossum <vegard.nossum@...il.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] Optimize is_power_of_2().

Robert P. J. Day wrote:
> On Fri, 15 Jun 2007, H. Peter Anvin wrote:
> 
>> Vegard Nossum wrote:
>>> From: Vegard Nossum <vegard@...tkore.net>
>>> Date: Fri, 15 Jun 2007 18:35:49 +0200
>>> Subject: [PATCH] Optimize is_power_of_2().
>>>
>>> Rationale: Removes one conditional branch and reduces icache
>>> footprint. Proof: If n is false, the product of n and any value is
>>> false. If n is true, the truth of (n * x) is the truth of x.
>> You realize that on a lot of platforms, multiplication is done by an
>> out-of-line subroutine, and that even on those that aren't, it's
>> generally a long-pipeline operation, right?
> 
> i was *just* about to mention somthing of the sort, but probably not
> as succinctly.
> 

You also realize, I hope, that obfuscating this for the compiler just
hurts.  As has already been described, there are versions involving
jumps, selects, multiplication, tables, ffs, and popcount.  All of those
operations are hideously expensive on some architectures and cheap on
others.

Personally, I would bet that the version as originally written is
probably best, since select (conditional move) is a fairly commonly
supported operation, especially on architectures where jumps are expensive.

Just to confuse the situation further, here is yet another version:

bool is_power_of_2(unsigned long n)
{
	return !!n & !(n & (n-1));
}

... which, on i686 compiles to:
00000000 <is_power_of_2>:
   0:   85 c0                   test   %eax,%eax
   2:   8d 50 ff                lea    0xffffffff(%eax),%edx
   5:   0f 95 c1                setne  %cl
   8:   85 d0                   test   %edx,%eax
   a:   0f 94 c0                sete   %al
   d:   0f b6 c0                movzbl %al,%eax
  10:   21 c8                   and    %ecx,%eax
  12:   c3                      ret

In the current version, gcc does generate a jump on i686 and x86-64:

00000000 <is_power_of_2>:
   0:   89 c2                   mov    %eax,%edx
   2:   31 c0                   xor    %eax,%eax
   4:   85 d2                   test   %edx,%edx
   6:   74 0b                   je     13 <is_power_of_2+0x13>
   8:   8d 42 ff                lea    0xffffffff(%edx),%eax
   b:   85 c2                   test   %eax,%edx
   d:   0f 94 c0                sete   %al
  10:   83 e0 01                and    $0x1,%eax
  13:   f3 c3                   repz ret

	-hpa

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ