linux-kernel - Re: [PATCH 1/5] ARM: findbit: document ARMv5 bit offset calculation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHk-=wh75aOWvaEhYsAcCJSRzhy8MO_SdvDzB6zmvVs+FeAr6w@mail.gmail.com>
Date:   Fri, 28 Oct 2022 12:01:00 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     "Russell King (Oracle)" <linux@...linux.org.uk>
Cc:     Yury Norov <yury.norov@...il.com>,
        Catalin Marinas <catalin.marinas@....com>,
        Mark Rutland <mark.rutland@....com>,
        Will Deacon <will@...nel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-arm-kernel@...ts.infradead.org
Subject: Re: [PATCH 1/5] ARM: findbit: document ARMv5 bit offset calculation

On Fri, Oct 28, 2022 at 10:45 AM Russell King (Oracle)
<linux@...linux.org.uk> wrote:
>
> For the _find_first_bit, there isn't much difference in the number
> of instructions or really what is going on, only the organisation
> and flow of the code is more inline - but that shouldn't make much
> of a difference. Yet, there is a definite repeatable measurable
> difference between the two:

Hmm. Interestingly, your _find_first_zero_bit_le() (which
find_next_bit ends up using except for the first byte) ends up doing
an optimization that is technically not valid.

In particular, the *generic* code does

                        sz = min(idx * BITS_PER_LONG + __ffs(MUNGE(val)), sz);

for the final result.

In contrast, the arm code doesn't do the "min()" at all, and if there
are bits after the bitmap (in a partial byte), it will just return
those bits.

So the arm code ends up avoiding some operations. Which works most of
the time, because

 (a) usually bitmaps are byte-aligned anyway

 (b) most users do "find_first_bit(.., size) >= size" as the "found no
bits" test

but it actually looks to me like your handcoded arm code is simply
wrong. At least going by our docbook comments for find_first_bit:

 * Returns the bit number of the first set bit.
 * If no bits are set, returns @size.

And look here: bitmap_empty() does

        return find_first_bit(src, nbits) == nbits;

and now imagine that 'nbits' is not a small constant value (which we
handle separately) and is also not byte aligned.

Maybe I'm mis-reading your assembly (I "know" arm assembly, but I
can't read it fluently like x86). But I don't think so.

So I think your code is actually buggy, but probably the bug is quite
hard to trigger in practice due to (a)/(b) above.

We do have bitmaps that aren't byte-aligned. The cpumask ones are the
most common ones. But in the cpumask_first() implementation (which is
just a wrapper for find_first_bit()), our documentation actually says

 * Returns >= nr_cpu_ids if no cpus set.

and I think that may have been what we historically did elsewhere too,
and may be the source of the arm situation.

Anyway, this can be fixed by either

 (a) fixing the arm code

 (b) changing the docs and making that ">= size" be the right thing to do

but I do think this is a very strong example of why having
architecture-specific code like this is very very dangerous. Because
as it stands now, that arm code really looks like it's actively buggy
to me.

         Linus