[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <202405081354.B0A8194B3C@keescook>
Date: Wed, 8 May 2024 15:54:36 -0700
From: Kees Cook <keescook@...omium.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Justin Stitt <justinstitt@...gle.com>,
Peter Zijlstra <peterz@...radead.org>,
Mark Rutland <mark.rutland@....com>,
linux-hardening@...r.kernel.org, linux-kernel@...r.kernel.org,
llvm@...ts.linux.dev
Subject: Re: [RFC] Mitigating unexpected arithmetic overflow
On Wed, May 08, 2024 at 01:07:38PM -0700, Linus Torvalds wrote:
> On Wed, 8 May 2024 at 12:45, Kees Cook <keescook@...omium.org> wrote:
> >
> > On Wed, May 08, 2024 at 10:52:44AM -0700, Linus Torvalds wrote:
> > > Example:
> > >
> > > static inline u32 __hash_32_generic(u32 val)
> > > {
> > > return val * GOLDEN_RATIO_32;
> > > }
> >
> > But what about:
> >
> > static inline u32 __item_offset(u32 val)
> > {
> > return val * ITEM_SIZE_PER_UNIT;
> > }
>
> What about it? I'm saying that your tool NEEDS TO BE SMART ENOUGH to
> see the "what about".
This is verging on omniscience, though. Yes, some amount of additional
reasoning can be applied under certain conditions (though even that would
still require some kind of annotation), but GCC can't even reliably figure
out if a variable is unused in a single function. We're not going to be
able to track overflow tainting throughout the code base in a useful
fashion.
> IOW, exactly the same as "a+b < a". Yes, "a+b" might wrap around, but
> if the use is to then compare it with one of the addends, then clearly
> such a wrap-around is fine.
Yes, but it is an absolutely trivial example: it's a single expression
with only 2 variables. Proving that the value coming in from some
protocol that is passed through and manipulated by countless layers of the
kernel before it gets used inappropriately is not a reasonably solvable
problem. But we can get at the root cause: the language (and our use of
it) needs to change to avoid the confused calculation in the first place.
> A tool that doesn't look at how the result is used, and just blindly
> says "wrap-around error" is a tool that I think is actively
> detrimental.
I do hear what you're saying here. I think you're over-estimating the
compiler's abilities. And as I mentioned in the RFC, finding known bad
cases to protect is the reverse of what we want for coverage: we want
to _not_ cover the things we know to be _safe_. So we want to find
(via compiler or annotation) the cases where overflow is _expected_.
> We already expect a lot of kernel developers. We should not add on to
> that burden because of your pet project.
I think it's a misunderstanding to consider this a pet project. C's lack
of overflow intent has spawned entire language ecosystems in reaction.
There are entire classes of flaws that exist specifically because of this
kind of confusion in C. It is the topic of countless PhD research efforts.
People who care about catching overflows can slowly add the type
annotations over the next many years, and we'll reach reasonable coverage
soon enough. It doesn't seem onerous: I'm not asking that people even
do it themselves (though I'd love the help), I'm just trying to find
acceptance for annotations about where the sanitizer can ignore overflow.
> Put another way: I'm putting the onus on YOU to make sure your pet
> project is the "Yogi Bear" of pet projects - smarter than the average
> bear.
Sure, but if the bar is omniscience, that's not a path forward.
I haven't really heard a viable counterproposal here. Let me try something
more concrete: how would you propose we systemically solve flaws like
the one fixed below? And note that the first fix was incomplete and it
took 8 years before it got fully fixed:
a723968c0ed3 ("perf: Fix u16 overflows") (v4.3)
382c27f4ed28 ("perf: Fix perf_event_validate_size()") (v6.7)
Because the kernel didn't catch when the u16 overflowed, it allowed for
kernel memory content exposure to userspace. And that's just the place
that it got noticed. With the struct holding the u16 is referenced in
360 different source files, it's not always a trivial analysis. But,
for example, the sanitizer not finding a "wraps" annotation on the
perf_event::read_size variable would have caught it immediately on
overflow.
And I mean this _kind_ of bug; not this bug in particular. I'm just
using it as a hopefully reasonable example of the complexity level we
need to solve under.
-Kees
--
Kees Cook
Powered by blists - more mailing lists