netdev - Re: [GIT PULL] bitmap changes for v6.2-rc1

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHk-=wj_4xsWxLqPvkCV6eOJt7quXS8DyXn3zWw3W94wN=6yig@mail.gmail.com>
Date:   Fri, 23 Dec 2022 10:44:07 -0800
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Yury Norov <yury.norov@...il.com>
Cc:     Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
        Rasmus Villemoes <linux@...musvillemoes.dk>,
        Tariq Toukan <tariqt@...dia.com>,
        Valentin Schneider <vschneid@...hat.com>,
        linux-rdma@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [GIT PULL] bitmap changes for v6.2-rc1

On Thu, Dec 15, 2022 at 3:11 PM Yury Norov <yury.norov@...il.com> wrote:
>
> Please pull bitmap patches for v6.2. They spent in -next for more than
> a week without any issues. The branch consists of:

So I've been holding off on this because these bitmap pulls have
always scared me, and I wanted to have the time to actually look
through them in detail before pulling.

I'm back home, over the travel chaos, and while I have other pulls
pending, they seem to be benign fixes so I started looking at this.

And when looking at it, I did indeed finx what I think is a
fundamental arithmetic bug.

That small_const_nbits_off() is simply buggy.

Try this:

        small_const_nbits_off(64,-1);

and see it return true.

The thing is, doing divides in C is something you need to be very
careful about, because they do *not* behave nicely with signed values.
They do the "mathematically intuitive" thing of truncating towards
zero, but when you are talking about bit ranges like this, that is
*NOT* what you want.

The comment is fairly good, but it just doesn't match the code.

If you want to check that 'nbits-1' and 'off' are in the same word,
you simply should not use division at all. Sure, it would work, if you
verify that they are both non-negative, but the fact is, even then
it's just wrong, wrong, wrong. You want a shift operation or a masking
operation.

Honestly, in this case, I think the logical thing to do is "check that
the upper bits are the same". The way you do that is probably
something like

   !((off) ^ ((nbits)-1) & ~(BITS_PER_LONG-1))

which doesn't have the fundamental bug that the divide version has.

So anyway, I won't be pulling this. I appreciate that you are trying
to micro-optimize the bitop functions, but this is not the first time
your pull requests have made me worried and caused me to not pull.
This is such basic functionality that I really need these pull
requests to be less worrisome.

In other words, not only don't I want to see these kinds of bugs -
please make sure that there isn't anythign that looks even *remotely*
scary. The micro-optimizations had better be *so* obvious and so well
thought through that I not only don't find bugs in them, I don't even
get nervous about finding bugs.

Side note: that whole small_const_nbits_off() thing could be possibly
improved in other ways. It's actually unnecessarily strict (if it was
correct, that is): we don't really require 'off' to be constant, what
we *do* want is

 - "nbits > off" needs to be a compile-time true constant

 - that "off is in the same word as nbits-1" also needs to be a
compile-time true constant.

and the compiler could determine both of those to be the case even if
'off' itself isn't a constant, if there is code around it that
verifies it.

For example, if you have code like this:

        off = a & 7;
        n = find_next_bit(bitmap, 64, off);

then the compiler could still see that it's that "last word only"
case, even though 'off' isn't actually a constant value.

So you could try using a helper macro like

   #define COMPILE_TIME_TRUE(x) (__builtin_constant_p(x) && (x))

to handle those kinds of things. But again - the code needs to be
OBVIOUSLY CORRECT. Make me feel those warm and fuzzies about it
because it's so obviously safe and correct.

That said, I see your test-cases for your micro-optimizations, but I'd
also like to see references to where it actually improves real code.

I know for a fact that 'small_const_nbits()' actually triggers in real
life. I don't see that your 'small_const_nbits_off()' would trigger in
real life in any situation that isn't already covered by
'small_const_nbits()'.

So convince me not only that the optimizations are obviously correct,
but also that they actually matter.

                Linus