linux-kernel - Re: [PATCH v2 1/3] lib/find_bit: introduce FIND_FIRST

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YwYlWjlWO3fFrtQp@yury-laptop>
Date:   Wed, 24 Aug 2022 06:19:22 -0700
From:   Yury Norov <yury.norov@...il.com>
To:     Andy Shevchenko <andy.shevchenko@...il.com>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Guenter Roeck <linux@...ck-us.net>,
        Dennis Zhou <dennis@...nel.org>,
        Russell King <linux@...linux.org.uk>,
        Catalin Marinas <catalin.marinas@....com>,
        Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
        Rasmus Villemoes <linux@...musvillemoes.dk>,
        Alexey Klimov <aklimov@...hat.com>,
        Kees Cook <keescook@...omium.org>,
        Andy Whitcroft <apw@...onical.com>
Subject: Re: [PATCH v2 1/3] lib/find_bit: introduce FIND_FIRST_BIT() macro

On Wed, Aug 24, 2022 at 12:10:02PM +0300, Andy Shevchenko wrote:
> On Wed, Aug 24, 2022 at 4:51 AM Yury Norov <yury.norov@...il.com> wrote:
> >
> > Now that we have many flavors of find_first_bit(), and expect even more,
> > it's better to have one macro that generates optimal code for all and makes
> > maintaining of slightly different functions simpler.
> >
> > The logic common to all versions is moved to the new macro, and all the
> > flavors are generated by providing an EXPRESSION macro-parameter, like
> > in this example:
> >
> >   #define FIND_FIRST_BIT(EXPRESSION, size) ...
> >
> >   find_first_ornot_and_bit(addr1, addr2, addr3, size)
> >   {
> >         return FIND_NEXT_BIT(addr1[idx] | ~addr2[idx] & addr3[idx], size);
> >   }
> >
> > The EXPRESSION may be of any complexity, as soon as it only refers
> > the bitmap(s) and an iterator idx.
> >
> > The 'word_op' is here to allow the macro to generate code for _le
> > versions on big-endian machines; used in the following patches.
> 
> ...
> 
> > +#ifndef word_op
> > +#define word_op
> > +#endif
> 
> Not sure about the naming without namespace. Perhaps __ffs_word_op?
> 
> > +#define FIND_FIRST_BIT(EXPRESSION, size)                                       \
> > +({                                                                             \
> > +       unsigned long idx, val, sz = (size);                                    \
> > +                                                                               \
> > +       for (idx = 0; idx * BITS_PER_LONG < sz; idx++) {                        \
> 
> I think we can do slightly better:
> 
> for (unsigned long idx = 0; idx < sz; idx += BITS_PER_LONG) {
>   unsigned long val;

This will blow up the EXPRESSION. We can mitigate it on user side:
  find_first_bit(addr, size)
  {
        return FIND_FIRST_BIT(addr[idx/BITS_PER_LONG], size);
  }

But to me it's a wtf++.

And generated code looks almost the same, except that
on x86_64 your version is bigger. Compare before:
0000000000000000 <_find_first_bit>:
   0:   mov    %rsi,%rax
   3:   test   %rsi,%rsi
   6:   je     35 <_find_first_bit+0x35>
   8:   xor    %edx,%edx
   a:   jmp    19 <_find_first_bit+0x19>
   c:   add    $0x40,%rdx               // Track bits and
  10:   add    $0x8,%rdi                // index separately
  14:   cmp    %rax,%rdx
  17:   jae    35 <_find_first_bit+0x35>
  19:   mov    (%rdi),%rcx
  1c:   test   %rcx,%rcx
  1f:   je     c <_find_first_bit+0xc>
  21:   tzcnt  %rcx,%rcx
  26:   add    %rdx,%rcx
  29:   cmp    %rcx,%rax
  2c:   cmova  %rcx,%rax
  30:   jmp    35 <_find_first_bit+0x35>
  35:   jmp    3a <_find_first_bit+0x3a>
  3a:   nopw   0x0(%rax,%rax,1)

And after:
0000000000000000 <_find_first_bit>:
   0:   mov    %rsi,%rax
   3:   test   %rsi,%rsi
   6:   je     39 <_find_first_bit+0x39>
   8:   xor    %edx,%edx
   a:   jmp    15 <_find_first_bit+0x15>
   c:   add    $0x40,%rdx               // Track bits only
  10:   cmp    %rdx,%rax 
  13:   jbe    39 <_find_first_bit+0x39>
  15:   mov    %rdx,%rcx
  18:   shr    $0x6,%rcx                // But divide here
  1c:   mov    (%rdi,%rcx,8),%rcx
  20:   test   %rcx,%rcx
  23:   je     c <_find_first_bit+0xc>
  25:   tzcnt  %rcx,%rcx
  2a:   add    %rcx,%rdx
  2d:   cmp    %rdx,%rax
  30:   cmova  %rdx,%rax
  34:   jmp    39 <_find_first_bit+0x39>
  39:   jmp    3e <_find_first_bit+0x3e>
  3e:   xchg   %ax,%ax                  // Which adds 4 bytes to .text 

Thanks,
Yury

> > +               val = (EXPRESSION);                                             \
> > +               if (val) {                                                      \
> > +                       sz = min(idx * BITS_PER_LONG + __ffs(word_op(val)), sz);\
> 
> sz = min(idx + __ffs(...));
> 
> > +                       break;                                                  \
> > +               }                                                               \
> > +       }                                                                       \
> > +                                                                               \
> > +       sz;                                                                     \
> > +})
> 
> 
> -- 
> With Best Regards,
> Andy Shevchenko