[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Yt7+pPyEmYYH8D1K@yury-laptop>
Date: Mon, 25 Jul 2022 13:35:48 -0700
From: Yury Norov <yury.norov@...il.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Guenter Roeck <linux@...ck-us.net>,
Dennis Zhou <dennis@...nel.org>,
Russell King - ARM Linux <linux@...linux.org.uk>,
Catalin Marinas <catalin.marinas@....com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: Linux 5.19-rc8
On Mon, Jul 25, 2022 at 11:49:09AM -0700, Linus Torvalds wrote:
> On Mon, Jul 25, 2022 at 10:55 AM Linus Torvalds
> <torvalds@...ux-foundation.org> wrote:
> >
> > I think the fix might be something like this:
>
> Hmm. Maybe the fix is to just not have the arm architecture-specific
> version at all.
I agree (see my other email in the thread). If no objections from ARM
people, I can drop it.
> The generic code handles the "small constant size bitmap that fits in
> a word" case better than the ARM special case code does.
>
> And the generic code handles the "scan large bitmap" case better than
> the ARM code does too, in that it does things a word at a time, while
> the ARM special case code does things one byte at a time.
>
> The ARM code does have a few things going for it:
>
> (a) it's simple
>
> (b) it has separate routines for the little-endian case
>
> Now, (a) is probably not too strong an argument, because it's arguably
> *too* simple, and buggy as a result. And having looked a bit more,
> it's not just _find_next_bit_le() that has this bug, it's all the
> "next" versions (zero-bit and big-endian).
>
> But (b) is actively better than what the generic "find bit" code has.
> The generic code is kind of disgusting in this area, with code like
>
> if (le)
> tmp = swab(tmp);
The patch that adds this is: b78c57135d470 ("lib/find_bit.c: join
_find_next_bit{_le}"), so I did that on purpose.
> in lib/find_bit.c and this is nasty for two reasons:
>
> (1) on little-endian, the "le" flag is mis-named: it's always zero,
> and it never should swab, but the code was taken from some big-endian
> generic case
Yes, the "le" is a bad name, and I think it should be changed. Are you OK
with "need_swab"?
> (2) even on big-endian, that "le" flag is basically a compile-time
> constant, but the header file shenanigans have hidden that fact, so it
> ends up being a "pass a constant to a function that then has to test
> it dynamically" situation
I think it's not measurable, at least find_bit_benchmark didn't get worse.
Even if find_next_bit() is invoked many times in a loop, we can expect
that branch predictor would optimize the difference out.
> So the generic code is in many ways better than the ARM special case
> code, but it has a couple of unfortunate warts too. At least those
> unfortunate warts aren't outright *bugs*, they are just ugly.
Here we have 2 ugly options - having pairs of almost identical
functions, or passing dummy variables. I decided that copy-pasting is
worse than abusing branch predictor.
Thanks,
Yury
Powered by blists - more mailing lists