[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wgoE5EkH+sQwi4KhRhCZizUxwZAnC=+9RbZcw7g6016LQ@mail.gmail.com>
Date: Wed, 8 May 2024 16:47:25 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Kees Cook <keescook@...omium.org>
Cc: Justin Stitt <justinstitt@...gle.com>, Peter Zijlstra <peterz@...radead.org>,
Mark Rutland <mark.rutland@....com>, linux-hardening@...r.kernel.org,
linux-kernel@...r.kernel.org, llvm@...ts.linux.dev
Subject: Re: [RFC] Mitigating unexpected arithmetic overflow
On Wed, 8 May 2024 at 15:54, Kees Cook <keescook@...omium.org> wrote:
>
> Sure, but if the bar is omniscience, that's not a path forward.
I really don't think there's a lot of omniscience involved at all.
> I haven't really heard a viable counterproposal here.
So let me make the counter-proposal that you actually concentrate on
very limited cases.
Baby steps, in other words.
For example, the most common case of overflow we've ever had has very
much been array indexing. Now, sometimes that has actually been actual
undefined behavior, because it's been overflow in signed variables,
and those are "easy" to find in the sense that you just say "no, can't
do that". UBSAN finds them, and that's good.
So I'd suggest seeing if we can catch the cases that *technically*
aren't UB, but match dangerous overflow situations. IOW, start with
very limited and very targeted things to look out for.
For example: some of the "this is not UB" may actually be trivial
mistakes that come about because of the silent C type casting rules:
overflow in unsigned arithmetic isn't UB, but it's actually very easy
to make arithmetic unsigned "by mistake".
IOW, a pattern to look out for might be
int a;
...
a * sizeof(xyz);
where "sizeof" obviously is unsigned, and then the C integer
promotions end up making this a well-defined operation that probably
really *is* dangerous, because the integer promotion basically adds an
implicit cast from "int" to "size_t" in there.
End result: it's not UB, but maybe it is not-*accidentally*-UB, and
thus maybe worth a warning? See what I'm saying?
So *that* I feel could be something where you can warn without a ton
of compiler smarts at all. If you see an *implicit* cast to unsigned
and then the subsequent operations wraps around, it's probably worth
being a lot more worried about.
Now, another thing to do might be to treat assignments (and again,
implicit casts) to 'size_t' specially. In most compilers (certainly in
gcc), "size_t" is a special type.
Now, in the kernel, we don't actually use __builtin_size_t (the kernel
started out with a very standalone type system, and I think it
actually predated gcc's __builtin_size_t or maybe I just wasn't aware
of it). But we certainly _could_ - or we could at least use some
special marked type for our own 'size_t'.
And basically try to catch cases where arithmetic is explicitly used
for a size calculation: things like 'kmalloc()' etc are the obvious
things. And yes, the result will then have an (implicit) cast to
'size_t' as part of the calling convention.
That's another "pattern that sounds relevant and easy to notice for a compiler".
And, finally, simply pointer arithmetic. Again, this is *not* some
"omniscience required". This is something very obvious where the
compiler is *very* aware of it being pointer arithmetic, because
pointer arithmetic has that implicit (and important) "take size of
object into account".
So again, if you see an integer expression that leads to pointer
arithmetic, at that point the overflow very much is relevant.
See? Three very *targeted* things where
(a) the compiler is very much aware of the use
(b) you can fairly confidently say "integer arithmetic wraparound is
a dangerous thing even when not UB".
So what I object to - loudly - is some completely bogus argument of
"integer wraparound is always wrong".
And dammit, I have seen you state something that sounds very much like
that several times, most recently in just this thread where you were
arguing for a special "u32_wrap" type just to allow wrapping without
complaints.
And I think it's that overly-generic "wrap-around is wrong" argument
that is completely bogus and needs to be nipped in the bud.
Wrap-around is not only well-defined, it's *useful*, and it's *used*,
and I find tools that warn about good code (like our existing hash
functions) to be *HORRIBLE*.
False positive warnings make real warnings worthless.
And so I think warning about wrap-around in general is mindless and
wrong and actively counter-productive. If people then just "fix" the
warning by casting it to "u32_wrap", you have only caused problems.
But catching wrap-around in *specific* cases, if you can argue for why
it's wrong in *those* cases: that sounds fine.
And I tried to give three examples of such specific cases above - I'm
sure there are many others.
To conclude: I think the path forward is not at all omniscience. It's
"limit yourself to cases that matter". Don't think of wrap-around as
some "global evil" like you seem to do. Think of it as wrong in
_specific_ cases, and see if you can target those cases.
Start small, in other words.
Now, to take the specific example you brought up: commits a723968c0ed3
and 382c27f4ed28. Honestly, I think you'll find that that one wouldn't
have been caught by some arithmetic wrap-around issue, because the
*arithmetic* was actually done in a type that didn't even wrap around.
IOW, in that case, you had "int size", and none of the arithmetic on
the size actually wrapped.
No, the only point where that actually failed was then when a
(non-overflowing, non-wrapping) final value was assigned to a 16-bit
field, ie the problem only ever happened at the final assignment:
event->read_size = size;
where no overflow had ever happened before that.
So in *that* case, you actually have a much more interesting
situation. Not wrap-around, not overflow, but "implicit cast drops
significant bits".
And yes, I do think implicit integer casts are dangerous. Often *more*
dangerous than arithmetic overflow and wrapping. We've had absolutely
tons of them. Some of our most traditional bugs have very much been
about implicit casting losing bits and causing problems as a result.
In fact, that's the reason we have MAX_RW_COUNT - because we used to
have too many places where various drivers or filesystems ended up
casting a "size_t" count argument down to "int count". So we are
literally violating POSIX semantics for some of the most important
system calls (read() and write()) as a safety measure against hidden
implicit truncating casts.
I think that would be a completely different area that might be worth
looking at: instrumenting implicit casts for "drops bits". I'm afraid
that it's just *so* common than we might not be able to do that
sanely.
But again, maybe it would be worth looking into, at least for some
limited cases. To go back to your particular example, it might be
worth trying to limit it for unusual type sizes like implicit casts to
'u16' or bitfields: not because those type sizes are magical, but
because it might be a way to limit the pain.
Summary: targeted baby steps, not some draconian "wrap-around is wrong".
Linus
Powered by blists - more mailing lists