lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Mon, 25 Jul 2022 13:35:48 -0700 From: Yury Norov <yury.norov@...il.com> To: Linus Torvalds <torvalds@...ux-foundation.org> Cc: Guenter Roeck <linux@...ck-us.net>, Dennis Zhou <dennis@...nel.org>, Russell King - ARM Linux <linux@...linux.org.uk>, Catalin Marinas <catalin.marinas@....com>, Linux Kernel Mailing List <linux-kernel@...r.kernel.org> Subject: Re: Linux 5.19-rc8 On Mon, Jul 25, 2022 at 11:49:09AM -0700, Linus Torvalds wrote: > On Mon, Jul 25, 2022 at 10:55 AM Linus Torvalds > <torvalds@...ux-foundation.org> wrote: > > > > I think the fix might be something like this: > > Hmm. Maybe the fix is to just not have the arm architecture-specific > version at all. I agree (see my other email in the thread). If no objections from ARM people, I can drop it. > The generic code handles the "small constant size bitmap that fits in > a word" case better than the ARM special case code does. > > And the generic code handles the "scan large bitmap" case better than > the ARM code does too, in that it does things a word at a time, while > the ARM special case code does things one byte at a time. > > The ARM code does have a few things going for it: > > (a) it's simple > > (b) it has separate routines for the little-endian case > > Now, (a) is probably not too strong an argument, because it's arguably > *too* simple, and buggy as a result. And having looked a bit more, > it's not just _find_next_bit_le() that has this bug, it's all the > "next" versions (zero-bit and big-endian). > > But (b) is actively better than what the generic "find bit" code has. > The generic code is kind of disgusting in this area, with code like > > if (le) > tmp = swab(tmp); The patch that adds this is: b78c57135d470 ("lib/find_bit.c: join _find_next_bit{_le}"), so I did that on purpose. > in lib/find_bit.c and this is nasty for two reasons: > > (1) on little-endian, the "le" flag is mis-named: it's always zero, > and it never should swab, but the code was taken from some big-endian > generic case Yes, the "le" is a bad name, and I think it should be changed. Are you OK with "need_swab"? > (2) even on big-endian, that "le" flag is basically a compile-time > constant, but the header file shenanigans have hidden that fact, so it > ends up being a "pass a constant to a function that then has to test > it dynamically" situation I think it's not measurable, at least find_bit_benchmark didn't get worse. Even if find_next_bit() is invoked many times in a loop, we can expect that branch predictor would optimize the difference out. > So the generic code is in many ways better than the ARM special case > code, but it has a couple of unfortunate warts too. At least those > unfortunate warts aren't outright *bugs*, they are just ugly. Here we have 2 ugly options - having pairs of almost identical functions, or passing dummy variables. I decided that copy-pasting is worse than abusing branch predictor. Thanks, Yury
Powered by blists - more mailing lists