lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Thu, 15 Sep 2022 17:25:24 +0100 From: Valentin Schneider <vschneid@...hat.com> To: Yury Norov <yury.norov@...il.com>, Linus Torvalds <torvalds@...ux-foundation.org>, linux-kernel@...r.kernel.org Cc: Yury Norov <yury.norov@...il.com>, Alexey Klimov <klimov.linux@...il.com>, Andy Shevchenko <andriy.shevchenko@...ux.intel.com>, Andy Whitcroft <apw@...onical.com>, Catalin Marinas <catalin.marinas@....com>, David Laight <David.Laight@...LAB.COM>, Dennis Zhou <dennis@...nel.org>, Guenter Roeck <linux@...ck-us.net>, Kees Cook <keescook@...omium.org>, Rasmus Villemoes <linux@...musvillemoes.dk>, Sven Schnelle <svens@...ux.ibm.com>, Russell King <linux@...linux.org.uk> Subject: Re: [PATCH v4 3/4] lib/find_bit: optimize find_next_bit() functions On 14/09/22 19:07, Yury Norov wrote: > Over the past couple years, the function _find_next_bit() was extended > with parameters that modify its behavior to implement and- zero- and le- > flavors. The parameters are passed at compile time, but current design > prevents a compiler from optimizing out the conditionals. > > As find_next_bit() API grows, I expect that more parameters will be added. > Current design would require more conditional code in _find_next_bit(), > which would bloat the helper even more and make it barely readable. > > This patch replaces _find_next_bit() with a macro FIND_NEXT_BIT, and adds > a set of wrappers, so that the compile-time optimizations become possible. > > The common logic is moved to the new macro, and all flavors may be > generated by providing a FETCH macro parameter, like in this example: > > #define FIND_NEXT_BIT(FETCH, MUNGE, size, start) ... > > find_next_xornot_and_bit(addr1, addr2, addr3, size, start) > { > return FIND_NEXT_BIT(addr1[idx] ^ ~addr2[idx] & addr3[idx], > /* nop */, size, start); > } > > The FETCH may be of any complexity, as soon as it only refers the bitmap(s) > and an iterator idx. > > MUNGE is here to support _le code generation for BE builds. May be > empty. > > I ran find_bit_benchmark 16 times on top of 6.0-rc2 and 16 times on top > of 6.0-rc2 + this series. The results for kvm/x86_64 are: > > v6.0-rc2 Optimized Difference Z-score > Random dense bitmap ns ns ns % > find_next_bit: 787735 670546 117189 14.9 3.97 > find_next_zero_bit: 777492 664208 113284 14.6 10.51 > find_last_bit: 830925 687573 143352 17.3 2.35 > find_first_bit: 3874366 3306635 567731 14.7 1.84 > find_first_and_bit: 40677125 37739887 2937238 7.2 1.36 > find_next_and_bit: 347865 304456 43409 12.5 1.35 > > Random sparse bitmap > find_next_bit: 19816 14021 5795 29.2 6.10 > find_next_zero_bit: 1318901 1223794 95107 7.2 1.41 > find_last_bit: 14573 13514 1059 7.3 6.92 > find_first_bit: 1313321 1249024 64297 4.9 1.53 > find_first_and_bit: 8921 8098 823 9.2 4.56 > find_next_and_bit: 9796 7176 2620 26.7 5.39 > > Where the statistics is significant (z-score > 3), the improvement > is ~15%. > > According to the bloat-o-meter, the Image size is 10-11K less: > > x86_64/defconfig: > add/remove: 32/14 grow/shrink: 61/782 up/down: 6344/-16521 (-10177) > > arm64/defconfig: > add/remove: 3/2 grow/shrink: 50/714 up/down: 608/-11556 (-10948) > > Suggested-by: Linus Torvalds <torvalds@...ux-foundation.org> > Signed-off-by: Yury Norov <yury.norov@...il.com> Reviewed-by: Valentin Schneider <vschneid@...hat.com>
Powered by blists - more mailing lists