lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 10 Jun 2024 11:20:21 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Borislav Petkov <bp@...en8.de>
Cc: Peter Zijlstra <peterz@...radead.org>, Peter Anvin <hpa@...or.com>, Ingo Molnar <mingo@...nel.org>, 
	Thomas Gleixner <tglx@...utronix.de>, Rasmus Villemoes <linux@...musvillemoes.dk>, 
	Josh Poimboeuf <jpoimboe@...nel.org>, 
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>, "the arch/x86 maintainers" <x86@...nel.org>, 
	linux-arch <linux-arch@...r.kernel.org>
Subject: Re: [PATCH] x86: add 'runtime constant' infrastructure

On Mon, 10 Jun 2024 at 05:02, Borislav Petkov <bp@...en8.de> wrote:
>
> I think we should accept patches using this only when there really is
> a good, perf reason for doing so. Not "I wanna use this fance shite in
> my new driver just because...".

Absolutely.

So for example, if the code could possibly be a module, it's never
going to be able to use runtime constants.

If the code doesn't show up as "noticeable percentage of kernel time
on real loads", it will not be a valid use for runtime constants.

The reason I did __d_lookup_rcu() is that I have optimized that
function to hell and back before, and it *still* showed up at 14% of
kernel time on my "empty kernel build" benchmark. And the constant
load was a noticeable - but not dominant - part of that.

And yes, it shows up that high because it's all D$ misses, and the
machine I tested on has more CPU cores than cache, so it's all kinds
of broken. But the point ends up being that __d_lookup_rcu() is just
very very hot on loads that just do a lot of 'stat()' calls (and such
loads exist and aren't just microbenchmarks).

I have other functions I see in the 5%+ range of kernel overhead on
real machines, but they tend to be things like clear_page(), which is
another kind of issue entirely.

And yes, the benchmarks I run are odd ("why would anybody care about
an empty kernel build?") but somewhat real to me (since I do builds
between every pull even when they just change a couple of files).

And yes, to actually even see anything else than the CPU security
issues on x86, you need to build without debug support, and without
retpolines etc. So my profiles are "fake" in that sense, because they
are the best case profiles without a lot of the horror that people
enable.

Others will have other real benchmarks, which is why I do think we'd
end up with more uses of this. But I would expect a handful, not
"hundreds".

I could imagine some runtime constant in the core networking socket
code, for example. Or in some scheduler thing. Or kernel entry code.

But not ever in a driver or a filesystem, for example. Once you've
gotten that far off the core code path, the "load a variable" overhead
just isn't relevant any more.

                Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ