lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 12 Feb 2015 10:58:50 +0100
From:	Rasmus Villemoes <linux@...musvillemoes.dk>
To:	"George Spelvin" <linux@...izon.com>
Cc:	yury.norov@...il.com, akpm@...ux-foundation.org,
	chris@...is-wilson.co.uk, davem@...emloft.net, dborkman@...hat.com,
	hannes@...essinduktion.org, klimov.linux@...il.com,
	laijs@...fujitsu.com, linux-kernel@...r.kernel.org,
	msalter@...hat.com, takahiro.akashi@...aro.org, tgraf@...g.ch,
	valentinrothberg@...il.com
Subject: Re: [PATCH v3 1/3] lib: find_*_bit reimplementation

On Thu, Feb 12 2015, "George Spelvin" <linux@...izon.com> wrote:

>> Rasmus, your version has ANDing by mask, and resetting the mask at each iteration
>> of main loop. I think we can avoid it. What do you think on next?
>
> Yes, that's basically what I proposed (modulo checking for zero size and
> my buggy LAST_WORD_MASK).
>
> But two unconditional instructions in the loop are awfully minor; it's
> loads and conditional branches that cost.
>
> The reset of the mask can be done in parallel with other operations; it's
> only the AND that actually takes a cycle.
>
> I can definitely see the argument that, for code that's not used often
> enough to stay resident in the L1 cache, any speedup has to win by at
> least one L2 cache access to be worth taking another cache line.

That, and if the compiler was smart enough, the AND should actually be
free (at least on x86), since it can replace a test instruction. [1]

In any case, my code compiles to fewer bytes (though not an entire cache
line), and I think it is somewhat clearer - I'm obviously biased on the
latter point.

Rasmus


[1] Unfortunately, the compiler doesn't seem to be smart enough :-( When I
compile my code with gcc 5.0, I get

0000000000000000 <find_last_bit_rv>:
   0:   53                      push   %rbx
   1:   89 f1                   mov    %esi,%ecx
   3:   48 8d 5e 3f             lea    0x3f(%rsi),%rbx
   7:   f7 d9                   neg    %ecx
   9:   48 c7 c2 ff ff ff ff    mov    $0xffffffffffffffff,%rdx
  10:   48 83 e3 c0             and    $0xffffffffffffffc0,%rbx
  14:   48 d3 ea                shr    %cl,%rdx
  17:   eb 1a                   jmp    33 <find_last_bit_rv+0x33>
  19:   0f 1f 80 00 00 00 00    nopl   0x0(%rax)
  20:   48 89 d1                mov    %rdx,%rcx
  23:   48 23 0c df             and    (%rdi,%rbx,8),%rcx
  27:   48 c7 c2 ff ff ff ff    mov    $0xffffffffffffffff,%rdx
  2e:   48 85 c9                test   %rcx,%rcx
  31:   75 15                   jne    48 <find_last_bit_rv+0x48>
  33:   48 83 eb 01             sub    $0x1,%rbx
  37:   48 83 fb ff             cmp    $0xffffffffffffffff,%rbx
  3b:   75 e3                   jne    20 <find_last_bit_rv+0x20>
  3d:   48 89 f0                mov    %rsi,%rax
  40:   5b                      pop    %rbx
  41:   c3                      retq   
  42:   66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)
  48:   48 89 cf                mov    %rcx,%rdi
  4b:   48 c1 e3 06             shl    $0x6,%rbx
  4f:   e8 00 00 00 00          callq  54 <find_last_bit_rv+0x54>
  54:   48 98                   cltq   
  56:   48 01 d8                add    %rbx,%rax
  59:   5b                      pop    %rbx
  5a:   c3                      retq   
  5b:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)

the main loop is 20--3b. The test instruction at 2e seems to be
redundant. The same at 37: the sub instruction already sets plenty of
flags that could be used, so explicitly comparing %rbx to -1 seems
redundant.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ