linux-kernel - Re: [PATCH] sched: remove sched_find_first

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170515205819.womvfygyo55hbua6@yury-N73SV>
Date:   Mon, 15 May 2017 23:58:19 +0300
From:   Yury Norov <ynorov@...iumnetworks.com>
To:     Arnd Bergmann <arnd@...db.de>
Cc:     Ingo Molnar <mingo@...nel.org>,
        linux-arch <linux-arch@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Richard Henderson <rth@...ddle.net>,
        Ivan Kokshaysky <ink@...assic.park.msu.ru>,
        Matt Turner <mattst88@...il.com>,
        Akinobu Mita <mita@...aclelinux.com>,
        Mike Galbraith <efault@....de>
Subject: Re: [PATCH] sched: remove sched_find_first_bit()

On Mon, May 15, 2017 at 10:31:17PM +0200, Arnd Bergmann wrote:
> On Mon, May 15, 2017 at 6:17 PM, Yury Norov <ynorov@...iumnetworks.com> wrote:
> > On Mon, May 15, 2017 at 06:06:18PM +0200, Arnd Bergmann wrote:
> >> On Mon, May 15, 2017 at 5:47 PM, Yury Norov <ynorov@...iumnetworks.com> wrote:
> >> > On Sun, May 14, 2017 at 08:09:17PM +0200, Ingo Molnar wrote:
> >> >
> >> > I also think that sched_find_first_bit() may be faster that find_first_bit()
> >> > because it's inlined in the caller. We can do so for find_first_bit() if
> >> > it takes small sizes at compile time, and so all parts of kernel will
> >> > use fast find_first_bit, not only sched.
> >>
> >> I suspect the first step would be to 'select GENERIC_FIND_FIRST_BIT'
> >> on ARM64, which should already improve the performance for those
> >> files that never call the 'next' variants.
> >>
> >> Adding an inline version of find_first_{,zero_}bit could also help, but
> >> is harder to quantify.
> >>
> >
> > I checked again, and in fact I measured on top of this patch:
> > https://lkml.org/lkml/2017/5/13/137
> > So find_first_bit is already enabled.
> 
> Ok. I've played around with this for a bit more and came to a generic
> version that is almost as good as the current sched_find_first_bit()
> on x86 (one extra comparison):
> 
> +#define sched_find_first_bit(b) find_first_bit(b, 128)
> -extern unsigned long find_first_bit(const unsigned long *addr,
> +extern unsigned long __find_first_bit(const unsigned long *addr,
>                                     unsigned long size);
> 
> +static inline unsigned long find_first_bit(const unsigned long *addr,
> +                                   unsigned long size)
> +{
> +       unsigned long idx;
> +
> +       if (!__builtin_constant_p(size))
> +               return __find_first_bit(addr, size);
> +
> +       idx = 0;
> +       switch (size) {
> +       case BITS_PER_LONG * 4:
> +               if (addr[0])
> +                       return __ffs(addr[0]) + idx;
> +               addr++;
> +               idx += BITS_PER_LONG;
> +       case BITS_PER_LONG * 3:
> +               if (addr[0])
> +                       return __ffs(addr[0]) + idx;
> +               addr++;
> +               idx += BITS_PER_LONG;
> +       case BITS_PER_LONG * 2:
> +               if (addr[0])
> +                       return __ffs(addr[0]) + idx;
> +               addr++;
> +               idx += BITS_PER_LONG;
> +       case BITS_PER_LONG * 1:
> +               if (addr[0])
> +                       return __ffs(addr[0]) + idx;
> +               addr++;
> +               idx += BITS_PER_LONG;
> +               return idx;
> +       }
> +
> +       return __find_first_bit(addr, size);
> +}
> 
Yes, something like this. But size is not the multiple of BITS_PER_LONG in
general. This should work better:

       switch (round_up(size), BITS_PER_LONG) {
       case BITS_PER_LONG * 4:
               if (addr[0])
                       goto ret;
               addr++;
               idx += BITS_PER_LONG;
       case BITS_PER_LONG * 3:
               if (addr[0])
                       goto ret;
               addr++;
               idx += BITS_PER_LONG;
       case BITS_PER_LONG * 2:
               if (addr[0])
                       goto ret;
               addr++;
               idx += BITS_PER_LONG;
       case BITS_PER_LONG * 1:
               if (addr[0])
                       goto ret;
               addr++;
               idx += BITS_PER_LONG;
               return idx;
       }

       return __find_first_bit(addr, size);

ret:
       return idx + min(__ffs(addr[0]), size % BITS_PER_LONG;
       }

(I didn't test it yet though)

> However, on architectures that rely on
> include/asm-generic/bitops/__ffs.h or something
> similarly verbose, this would just add needless bloat
> to the size rather than actually making a difference
> in performance.
> 
>         Arnd