linux-kernel - Re: Linux 5.19-rc8

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Yt7+pPyEmYYH8D1K@yury-laptop>
Date:   Mon, 25 Jul 2022 13:35:48 -0700
From:   Yury Norov <yury.norov@...il.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Guenter Roeck <linux@...ck-us.net>,
        Dennis Zhou <dennis@...nel.org>,
        Russell King - ARM Linux <linux@...linux.org.uk>,
        Catalin Marinas <catalin.marinas@....com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: Linux 5.19-rc8

On Mon, Jul 25, 2022 at 11:49:09AM -0700, Linus Torvalds wrote:
> On Mon, Jul 25, 2022 at 10:55 AM Linus Torvalds
> <torvalds@...ux-foundation.org> wrote:
> >
> > I think the fix might be something like this:
> 
> Hmm. Maybe the fix is to just not have the arm architecture-specific
> version at all.

I agree (see my other email in the thread). If no objections from ARM
people, I can drop it.

> The generic code handles the "small constant size bitmap that fits in
> a word" case better than the ARM special case code does.
> 
> And the generic code handles the "scan large bitmap" case better than
> the ARM code does too, in that it does things a word at a time, while
> the ARM special case code does things one byte at a time.
> 
> The ARM code does have a few things going for it:
> 
>  (a) it's simple
> 
>  (b) it has separate routines for the little-endian case
> 
> Now, (a) is probably not too strong an argument, because it's arguably
> *too* simple, and buggy as a result. And having looked a bit more,
> it's not just _find_next_bit_le() that has this bug, it's all the
> "next" versions (zero-bit and big-endian).
> 
> But (b) is actively better than what the generic "find bit" code has.
> The generic code is kind of disgusting in this area, with code like
> 
>         if (le)
>                 tmp = swab(tmp);

The patch that adds this is: b78c57135d470 ("lib/find_bit.c: join
_find_next_bit{_le}"), so I did that on purpose.

> in lib/find_bit.c and this is nasty for two reasons:
> 
>  (1) on little-endian, the "le" flag is mis-named: it's always zero,
> and it never should swab, but the code was taken from some big-endian
> generic case

Yes, the "le" is a bad name, and I think it should be changed. Are you OK
with "need_swab"?
 
>  (2) even on big-endian, that "le" flag is basically a compile-time
> constant, but the header file shenanigans have hidden that fact, so it
> ends up being a "pass a constant to a function that then has to test
> it dynamically" situation

I think it's not measurable, at least find_bit_benchmark didn't get worse.
Even if find_next_bit() is invoked many times in a loop, we can expect
that branch predictor would optimize the difference out.
 
> So the generic code is in many ways better than the ARM special case
> code, but it has a couple of unfortunate warts too. At least those
> unfortunate warts aren't outright *bugs*, they are just ugly.

Here we have 2 ugly options - having pairs of almost identical
functions, or passing dummy variables. I decided that copy-pasting is
worse than abusing branch predictor.

Thanks,
Yury