[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAH8bW-fb0wPwwvo8P8VW33zV=Wi_LPWxdJH8y2wdGGqPE+3nA@mail.gmail.com>
Date:   Mon, 7 Dec 2020 17:59:16 -0800
From:   Yury Norov <yury.norov@...il.com>
To:     Will Deacon <will@...nel.org>
Cc:     Catalin Marinas <catalin.marinas@....com>,
        linux-arm-kernel@...ts.infradead.org,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-arch@...r.kernel.org, Alexey Klimov <aklimov@...hat.com>
Subject: Re: [PATCH] arm64: enable GENERIC_FIND_FIRST_BIT
(CC: Alexey Klimov)
On Mon, Dec 7, 2020 at 3:25 AM Will Deacon <will@...nel.org> wrote:
>
> On Sat, Dec 05, 2020 at 08:54:06AM -0800, Yury Norov wrote:
> > ARM64 doesn't implement find_first_{zero}_bit in arch code and doesn't
> > enable it in config. It leads to using find_next_bit() which is less
> > efficient:
>
> [...]
>
> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > index 1515f6f153a0..2b90ef1f548e 100644
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -106,6 +106,7 @@ config ARM64
> >       select GENERIC_CPU_AUTOPROBE
> >       select GENERIC_CPU_VULNERABILITIES
> >       select GENERIC_EARLY_IOREMAP
> > +     select GENERIC_FIND_FIRST_BIT
>
> Does this actually make any measurable difference? The disassembly with
> or without this is _very_ similar for me (clang 11).
>
> Will
On A-53 find_first_bit() is almost twice faster than find_next_bit(),
according to
lib/find_bit_benchmark. (Thanks to Alexey for testing.)
Yury
---
Tested-by: Alexey Klimov <aklimov@...hat.com>
Start testing find_bit() with random-filled bitmap
[7126084.864616] find_next_bit:                 9653351 ns, 164280 iterations
[7126084.881146] find_next_zero_bit:            9591974 ns, 163401 iterations
[7126084.893859] find_last_bit:                 5778627 ns, 164280 iterations
[7126084.948181] find_first_bit:               47389224 ns,  16357 iterations
[7126084.958975] find_next_and_bit:             3875849 ns,  73487 iterations
[7126084.965884]
                 Start testing find_bit() with sparse bitmap
[7126084.973474] find_next_bit:                  109879 ns,    655 iterations
[7126084.999365] find_next_zero_bit:           18968440 ns, 327026 iterations
[7126085.006351] find_last_bit:                   80503 ns,    655 iterations
[7126085.032315] find_first_bit:               19048193 ns,    655 iterations
[7126085.039303] find_next_and_bit:               82628 ns,      1 iterations
with enabled GENERIC_FIND_FIRST_BIT:
               Start testing find_bit() with random-filled bitmap
[   84.095335] find_next_bit:                 9600970 ns, 163770 iterations
[   84.111695] find_next_zero_bit:            9613137 ns, 163911 iterations
[   84.124143] find_last_bit:                 5713907 ns, 163770 iterations
[   84.158068] find_first_bit:               27193319 ns,  16406 iterations
[   84.168663] find_next_and_bit:             3863814 ns,  73671 iterations
[   84.175392]
               Start testing find_bit() with sparse bitmap
[   84.182660] find_next_bit:                  112334 ns,    656 iterations
[   84.208375] find_next_zero_bit:           18976981 ns, 327025 iterations
[   84.215184] find_last_bit:                   79584 ns,    656 iterations
[   84.233005] find_first_bit:               11082437 ns,    656 iterations
[   84.239821] find_next_and_bit:               82209 ns,      1 iterations
root@...e:~# cpupower -c all frequency-info | grep asserted
  current CPU frequency: 648 MHz (asserted by call to hardware)
  current CPU frequency: 648 MHz (asserted by call to hardware)
  current CPU frequency: 648 MHz (asserted by call to hardware)
  current CPU frequency: 648 MHz (asserted by call to hardware)
root@...e:~# lscpu
Architecture:                    aarch64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
CPU(s):                          4
On-line CPU(s) list:             0-3
Thread(s) per core:              1
Core(s) per socket:              4
Socket(s):                       1
Vendor ID:                       ARM
Model:                           4
Model name:                      Cortex-A53
Stepping:                        r0p4
CPU max MHz:                     1152.0000
CPU min MHz:                     648.0000
BogoMIPS:                        48.00
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Not affected
Vulnerability Spectre v1:        Mitigation; __user pointer sanitization
Vulnerability Spectre v2:        Not affected
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fp asimd evtstrm aes pmull sha1 sha2
crc32 cpuid
Powered by blists - more mailing lists
 
