lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <202405081035.6B1A791B@keescook>
Date: Wed, 8 May 2024 16:43:27 -0700
From: Kees Cook <keescook@...omium.org>
To: David Laight <David.Laight@...lab.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
	Justin Stitt <justinstitt@...gle.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Mark Rutland <mark.rutland@....com>,
	"linux-hardening@...r.kernel.org" <linux-hardening@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"llvm@...ts.linux.dev" <llvm@...ts.linux.dev>
Subject: Re: [RFC] Mitigating unexpected arithmetic overflow

On Wed, May 08, 2024 at 12:22:38PM +0000, David Laight wrote:
> Have you estimated the performance cost of checking the result of
> all integer arithmetic.

I hadn't included that in my already very long email as performance is
somewhat secondary to the correctness goals. Perhaps that was a mistake,
as it is going to be everyone's first question anyway. :) But yes,
I did have an initial analysis:


Performance Considerations
==========================
Adding arithmetic overflow checks, regardless of implementation,
will add more cycles. The good news is that the overflow outcome can
be pessimized, and checking the overflow bit on most architectures is
extraordinarily fast. Regardless, for many Linux deployments, the cost
of this correctness is seen as acceptable, though all users will benefit
from the fixing of bugs that the mitigation will find.

Looking specifically at proposal #1 below, we can do some estimations. For
a defconfig+kvm+FORTIFY config on v6.9-rc2 x86_64, the signed integer
overflow (SIO) checking is added in 21,552 places. The unsigned integer
overflow (UIO) checking is around 55,000 (though I would expect a good
portion of these to be eliminated as they are shown to be "wrap-around
expected"). Running with SIO enabled is mostly flawless, though a long
tail of false positives is expected. Running with UIO is not ready for
general consumption, and performing benchmarking there is not going to
give useful numbers. However, we can at least attempt to extrapolate from
an SIO build how a full SIO+UIO build would behave. For example, based
on instance counts, we could maybe assume SIO+UIO to be ~3x compared to
SIO. This could easily be an over or under estimation, though. Regardless,
some rough draft comparisons:

Image size
       Stock	60197540
    SIO,warn	60299436 +0.169%
    SIO,trap    60195567 -0.003% (trap mode probably found code to drop)

Kernel build 10x benchmark	Avg(s)		Std Dev
                     Stock	439.58		1.68
                  SIO,warn	444.20 (+1.00%)	1.35
                  SIO,trap      442.10 (+0.57%) 1.52

> If you have a cpu with 'signed/unsigned add(etc) with trap on overflow'
> instructions then maybe you could use them to panic the kernel.
> But otherwise you'll need a conditional branch after pretty much
> every arithmetic instruction.

Yes. This would be true for any implementation. Thankfully in some
places where bounds checking has already happened manually, the added
instrumentation checks will have been optimized away. But yes, turning
such a mitigation on isn't without impact. :) But a significant install
base is interested in correctness within a reasonable performance
budget. And some will take correctness over all other considerations.

> As well as the code bloat there is likely to be a 50% chance they
> are mis-predicted slowing things down a lot more.
> IIRC at least some x86 cpu do not default to static prediction (eg
> backwards taken forwards not) but always use data from the branch
> prediction logic - so the first time a branch is seen it is predicted
> randomly.

Sure, though I think the nuances of CPU design are somewhat tangential
to the overall topic: how do we provide a way for Linux to gain this
correctness coverage? It's accepted that there will be a non-trivial
impact, and I agree we can start to consider how to optimize
implementations. But for now I'd like to solve for how to even represent
arithmetic overflow intent at the source level.

-Kees

-- 
Kees Cook

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ