[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAH8bW-fb0wPwwvo8P8VW33zV=Wi_LPWxdJH8y2wdGGqPE+3nA@mail.gmail.com>
Date: Mon, 7 Dec 2020 17:59:16 -0800
From: Yury Norov <yury.norov@...il.com>
To: Will Deacon <will@...nel.org>
Cc: Catalin Marinas <catalin.marinas@....com>,
linux-arm-kernel@...ts.infradead.org,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
linux-arch@...r.kernel.org, Alexey Klimov <aklimov@...hat.com>
Subject: Re: [PATCH] arm64: enable GENERIC_FIND_FIRST_BIT
(CC: Alexey Klimov)
On Mon, Dec 7, 2020 at 3:25 AM Will Deacon <will@...nel.org> wrote:
>
> On Sat, Dec 05, 2020 at 08:54:06AM -0800, Yury Norov wrote:
> > ARM64 doesn't implement find_first_{zero}_bit in arch code and doesn't
> > enable it in config. It leads to using find_next_bit() which is less
> > efficient:
>
> [...]
>
> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > index 1515f6f153a0..2b90ef1f548e 100644
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -106,6 +106,7 @@ config ARM64
> > select GENERIC_CPU_AUTOPROBE
> > select GENERIC_CPU_VULNERABILITIES
> > select GENERIC_EARLY_IOREMAP
> > + select GENERIC_FIND_FIRST_BIT
>
> Does this actually make any measurable difference? The disassembly with
> or without this is _very_ similar for me (clang 11).
>
> Will
On A-53 find_first_bit() is almost twice faster than find_next_bit(),
according to
lib/find_bit_benchmark. (Thanks to Alexey for testing.)
Yury
---
Tested-by: Alexey Klimov <aklimov@...hat.com>
Start testing find_bit() with random-filled bitmap
[7126084.864616] find_next_bit: 9653351 ns, 164280 iterations
[7126084.881146] find_next_zero_bit: 9591974 ns, 163401 iterations
[7126084.893859] find_last_bit: 5778627 ns, 164280 iterations
[7126084.948181] find_first_bit: 47389224 ns, 16357 iterations
[7126084.958975] find_next_and_bit: 3875849 ns, 73487 iterations
[7126084.965884]
Start testing find_bit() with sparse bitmap
[7126084.973474] find_next_bit: 109879 ns, 655 iterations
[7126084.999365] find_next_zero_bit: 18968440 ns, 327026 iterations
[7126085.006351] find_last_bit: 80503 ns, 655 iterations
[7126085.032315] find_first_bit: 19048193 ns, 655 iterations
[7126085.039303] find_next_and_bit: 82628 ns, 1 iterations
with enabled GENERIC_FIND_FIRST_BIT:
Start testing find_bit() with random-filled bitmap
[ 84.095335] find_next_bit: 9600970 ns, 163770 iterations
[ 84.111695] find_next_zero_bit: 9613137 ns, 163911 iterations
[ 84.124143] find_last_bit: 5713907 ns, 163770 iterations
[ 84.158068] find_first_bit: 27193319 ns, 16406 iterations
[ 84.168663] find_next_and_bit: 3863814 ns, 73671 iterations
[ 84.175392]
Start testing find_bit() with sparse bitmap
[ 84.182660] find_next_bit: 112334 ns, 656 iterations
[ 84.208375] find_next_zero_bit: 18976981 ns, 327025 iterations
[ 84.215184] find_last_bit: 79584 ns, 656 iterations
[ 84.233005] find_first_bit: 11082437 ns, 656 iterations
[ 84.239821] find_next_and_bit: 82209 ns, 1 iterations
root@...e:~# cpupower -c all frequency-info | grep asserted
current CPU frequency: 648 MHz (asserted by call to hardware)
current CPU frequency: 648 MHz (asserted by call to hardware)
current CPU frequency: 648 MHz (asserted by call to hardware)
current CPU frequency: 648 MHz (asserted by call to hardware)
root@...e:~# lscpu
Architecture: aarch64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
Vendor ID: ARM
Model: 4
Model name: Cortex-A53
Stepping: r0p4
CPU max MHz: 1152.0000
CPU min MHz: 648.0000
BogoMIPS: 48.00
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Spec store bypass: Not affected
Vulnerability Spectre v1: Mitigation; __user pointer sanitization
Vulnerability Spectre v2: Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fp asimd evtstrm aes pmull sha1 sha2
crc32 cpuid
Powered by blists - more mailing lists