lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wjQCbRA1UEag-1-9yn08KNNqerTj++SCbbW80At=rg5RQ@mail.gmail.com>
Date: Sat, 22 Feb 2025 10:42:23 -0800
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Steven Rostedt <rostedt@...dmis.org>, Jens Axboe <axboe@...nel.dk>
Cc: Martin Uecker <uecker@...raz.at>, Dan Carpenter <dan.carpenter@...aro.org>, 
	Greg KH <gregkh@...uxfoundation.org>, Boqun Feng <boqun.feng@...il.com>, 
	"H. Peter Anvin" <hpa@...or.com>, Miguel Ojeda <miguel.ojeda.sandonis@...il.com>, 
	Christoph Hellwig <hch@...radead.org>, rust-for-linux <rust-for-linux@...r.kernel.org>, 
	David Airlie <airlied@...il.com>, linux-kernel@...r.kernel.org, ksummit@...ts.linux.dev
Subject: Re: Rust kernel policy

On Fri, 21 Feb 2025 at 14:23, Steven Rostedt <rostedt@...dmis.org> wrote:
>
> If I could just get a warning for this stupid mistake:
>
>         size_t ret;
>
>         ret = func();
>         if (ret < 0)
>                 error();
>
> I'd be very happy.

I really don't think the issue here should be considered "ret is
unsigned, so checking against zero is wrong".

Because as mentioned, doing range checks is always correct. A compiler
must not complain about that. So I still think that the horrid
"-Wtype-limits" warning is completely misguided.

No, the issue should be seen as "you got a signed value, then you
unwittingly cast it to unsigned, and then you checked if it's
negative".

That pattern is "easy" to check against in SSA form (because SSA is
really a very natural form for "where did this value come from"), and
I wrote a test patch for sparse.

But this test patch is actually interestign because it does show how
hard it is to give meaningful warnings.

Why? Because SSA is basically the "final form" before register
allocation and code generation - and that means that sparse (or any
real compiler) has already done a *lot* of transformations on the
source code. Which in turn means that sparse actually finds places
that have that pattern, not because the code was written as an
unsigned compare of something that used to be a signed value, but
because various simplifications had turned it into that.

Let me give a couple of examples. First, the actual case you want to
find as a test-case for sparse:

   typedef unsigned long size_t;
   extern int fn(void);
   extern int check(void);

   int check(void)
   {
        size_t ret = fn();
        if (ret < 0)
                return -1;
        return 0;
   }

which makes sparse complain about it:

    t.c:8:19: warning: unsigned value that used to be signed checked
for negative?
    t.c:7:24: signed value source

Look, that's nice (ok, I notice that the "offset within line" fields
have regressed at some point, so ignore that).

It tells you that you are doing an unsigned odd compare against zero
of a value that *used* to be signed, and tells you where the value
originated from.

Perfect, right?

Not so fast.

It actually doesn't result in very many warnings in the current kernel
when I run sparse over it all, so on the face of it it all seems like
a nice good safe warning that doesn't cause a lot of noise.

But then when looking at the cases it *does* find, they are very very
subtle indeed. A couple of them look fine:

    drivers/gpu/drm/panel/panel-samsung-s6e3ha2.c:455:26: warning:
unsigned value that used to be signed checked for negative?
    drivers/gpu/drm/panel/panel-samsung-s6e3ha2.c:452:35: signed value source

which turns out to be this:

        unsigned int brightness = bl_dev->props.brightness;
        ...
        if (brightness < S6E3HA2_MIN_BRIGHTNESS ||
                brightness > bl_dev->props.max_brightness) {

and that's actually pretty much exactly your pattern: 'brightness' is
indeed 'unsigned int', and S6E3HA2_MIN_BRIGHTNESS is indeed zero, and
the *source* of it all is indeed a signed value
(bl_dev->props.brightness is 'int' from 'struct
backlight_properties').

So the warning looks fine, and all it really should need is some extra
logic to *not* warn when there is also an upper bounds check (which
makes it all sane again), That warning is wrong because it's not smart
enough, but it's not "fundamentally wrong" like the type-based one
was. Fine so far.

And the sparse check actually finds real issues:

For example, it finds this:

    drivers/block/zram/zram_drv.c:1234:20: warning: unsigned value
that used to be signed checked for negative?
    drivers/block/zram/zram_drv.c:1234:13: signed value source

which looks odd, because it's all obviously correct:

        if (prio < ZRAM_PRIMARY_COMP || prio >= ZRAM_MAX_COMPS)
                return -EINVAL;

and 'prio' is a plain 'int'. So why would sparse have flagged it?

It's because ZRAM_PRIMARY_COMP is thgis:

        #define ZRAM_PRIMARY_COMP 0U

so while 'prio' is indeed signed, and checking it against 0 would be
ok, checking it against 0U is *NOT* ok, because it makes the whole
comparison unsigned.

So yes, sparse found a subtle mistake. A bug that looks real, although
one where it doesn't matter (because ZRAM_MAX_COMPS is *also* an
unsigned constant, so the "prio >= ZRAM_MAX_COMPS" test will make it
all ok, and negative values are indeed checked for).

Again, extending the patch to notice when the code does a unsigned
range check on the upper side too would make it all ok.

Very cool. Short, sweet, simple sparse patch that finds interesting
places, but they seem to be false positives.

In fact, it finds some *really* interesting patterns. Some of them
don't seem to be false positives at all.

For example, it reports this:

    ./include/linux/blk-mq.h:877:31: warning: unsigned value that used
to be signed checked for negative?
    drivers/block/null_blk/main.c:1550:46: signed value source

and that's just

                if (ioerror < 0)
                        return false;

and 'ioerror' is an 'int'. And here we're checking against plain '0',
not some subtle '0U' thing. So it's clearly correct, and isn't an
unsigned compare at all. Why would sparse even mention it?

The 'signed value source' gives a hint. This is an inline, and the
caller is this:

                cmd->error = null_process_cmd(cmd, req_op(req), blk_rq_pos(req),
                                                blk_rq_sectors(req));
                if (!blk_mq_add_to_batch(req, iob, (__force int) cmd->error,
                                        blk_mq_end_request_batch))

iow, the error is that 'cmd->error' thing, and that is starting to
give a hint about what sparse found. Sparse found a bug.

That '(__force int) cmd->error' is wrong. cmd->error is a blk_status_t, which is

        typedef u8 __bitwise blk_status_t;

which means that when cast to 'int', it *CANNOT* be negative. You're
supposed to use 'blk_status_to_errno()' to make it an error code. The
code is simply buggy, and what has happened is that sparse noticed
that the source of the 'int' was a 8-bit unsigned char, and then
sparse saw the subsequent compare, and said "it's stupid to do a 8-bit
to 32-bit type extension and then do the compare as a signed 32-bit
compare: I'll do it as a unsigned 8-bit compare on the original
value".

And then it noticed that as an unsigned 8-bit compare it made no sense any more.

Look, ma - it's the *perfect* check. Instead of doing the
(wrongheaded) type limit check, it's doing the *right* thing. It's
finding places where you actually mis-use unsigned compares.

No. It also finds a lot of really subtle stuff that is very much
correct, exactly because it does that kind of "oh, the source is a
16-bit unsigned field that has been turned into an 'int', and then
compared against zero" and complains about them.

And usually those complaints are bogus, because the "< 0" is important
in inline functions that do range checking on values that *can* be
negative, but just don't happen to be negative in this case because
the source couldn't be negative and earlier simplifications had turned
a signed compare into an unsigned one, so it now talks about that.

Oh well.

I'm adding Jens to the cc, because I do think that the

    drivers/block/null_blk/main.c:1550:46: signed value source

thing is a real bug, and that doing that (__force int) cmd->error is
bogus. I doubt anybody cares (it's the null_blk driver), but still..

I also pushed out the sparse patch in case anybody wants to play with
this, but while I've mentioned a couple of "this looks fine but not
necessarily relevant" warnings, the limitations of that patch really
do cause completely nonsensical warnings:

    git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/sparse.git
unsigned-negative

Not a ton of them, but some. bcachefs actually gets a number of them,
it looks like the games it plays with bkeys really triggers some of
that. I'm almost certain those are false positives, but exactly
because sparse goes *so* deep (there's tons of macros in there, but it
also follows the data flow through inline functions into the source of
the data), it can be really hard to see where it all comes from.

Anyway - good compiler warnings are really hard to generate. But
-Wtype-limits is *not* one of them.

               Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ