linux-kernel - Re: C aggregate passing (Rust kernel policy)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6EFFB41B-9145-496E-8217-07AF404BE695@zytor.com>
Date: Sat, 22 Feb 2025 12:54:31 -0800
From: "H. Peter Anvin" <hpa@...or.com>
To: Kent Overstreet <kent.overstreet@...ux.dev>,
        Linus Torvalds <torvalds@...ux-foundation.org>
CC: Ventura Jack <venturajack85@...il.com>, Gary Guo <gary@...yguo.net>,
        airlied@...il.com, boqun.feng@...il.com, david.laight.linux@...il.com,
        ej@...i.de, gregkh@...uxfoundation.org, hch@...radead.org,
        ksummit@...ts.linux.dev, linux-kernel@...r.kernel.org,
        miguel.ojeda.sandonis@...il.com, rust-for-linux@...r.kernel.org
Subject: Re: C aggregate passing (Rust kernel policy)

On February 22, 2025 12:00:04 PM PST, Kent Overstreet <kent.overstreet@...ux.dev> wrote:
>On Sat, Feb 22, 2025 at 11:18:33AM -0800, Linus Torvalds wrote:
>> On Sat, 22 Feb 2025 at 10:54, Kent Overstreet <kent.overstreet@...ux.dev> wrote:
>> >
>> > If that work is successful it could lead to significant improvements in
>> > code generation, since aliasing causes a lot of unnecessary spills and
>> > reloads - VLIW could finally become practical.
>> 
>> No.
>> 
>> Compiler people think aliasing matters. It very seldom does. And VLIW
>> will never become practical for entirely unrelated reasons (read: OoO
>> is fundamentally superior to VLIW in general purpose computing).
>
>OoO and VLIW are orthogonal, not exclusive, and we always want to go
>wider, if we can. Separately, neverending gift that is Spectre should be
>making everyone reconsider how reliant we've become on OoO.
>
>We'll never get rid of OoO, I agree on that point. But I think it's
>worth some thought experiments about how many branches actually need to
>be there vs. how many are there because everyone's assumed "branches are
>cheap! (so it's totally fine if the CPU sucks at the alternatives)" on
>both the hardware and software side.
>
>e.g. cmov historically sucked (and may still, I don't know), but a _lot_
>of branches should just be dumb ALU ops. I wince at a lot of the
>assembly I see gcc generate for e.g. short multiword integer
>comparisons, there are a ton of places where it'll emit 3 or 5 branches
>where 1 is all you need if we had better ALU primitives.
>
>> Aliasing is one of those bug-bears where compiler people can make
>> trivial code optimizations that look really impressive. So compiler
>> people *love* having simplistic aliasing rules that don't require real
>> analysis, because the real analysis is hard (not just expensive, but
>> basically unsolvable).
>
>I don't think crazy compiler experiments from crazy C people have much
>relevance, here. I'm talking about if/when Rust is able to get this
>right.
>
>> The C standards body has been much too eager to embrace "undefined behavior".
>
>Agree on C, but for the rest I think you're just failing to imagine what
>we could have if everything wasn't tied to a language with
>broken/missing semantics w.r.t. aliasing.
>
>Yes, C will never get a memory model that gets rid of the spills and
>reloads. But Rust just might. It's got the right model at the reference
>level, we just need to see if they can push that down to raw pointers in
>unsafe code.
>
>But consider what the world would look like if Rust fixes aliasing and
>we get a microarchitecture that's able to take advantage of it. Do a
>microarchitecture that focuses some on ALU ops to get rid of as many
>branches as possible (e.g. min/max, all your range checks that don't
>trap), get rid of loads and spills from aliasing so you're primarily
>running out of registers - and now you _do_ have enough instructions in
>a basic block, with fixed latency, that you can schedule at compile time
>to make VLIW worth it.
>
>I don't think it's that big of a leap. Lack of cooperation between
>hardware and compiler folks (and the fact that what the hardware people
>wanted was impossible at the time) was what killed Itanium, so if you
>fix those two things...
>
>> The kernel basically turns all that off, as much as possible. Overflow
>> isn't undefined in the kernel. Aliasing isn't undefined in the kernel.
>> Things like that.
>
>Yeah, the religion of undefined behaviour in C has been an absolute
>nightmare.
>
>It's not just the compiler folks though, that way of thinking has
>infected entirely too many people people in kernel and userspace -
>"performance is the holy grail and all that matters and thou shalt shave
>every single damn instruction".
>
>Where this really comes up for me is assertions, because we're not
>giving great guidance there. It's always better to hit an assertion than
>walk off into undefined behaviour la la land, but people see "thou shalt
>not crash the kernel" as a reason not to use BUG_ON() when it _should_
>just mean "always handle the error if you can't prove that it can't
>happen".
>
>> When 'integer overflow' means that you can _sometimes_ remove one
>> single ALU operation in *some* loops, but the cost of it is that you
>> potentially introduced some seriously subtle security bugs, I think we
>> know it was the wrong thing to do.
>
>And those branches just _do not matter_ in practice, since if one side
>leads to a trap they're perfectly predicted and to a first approximation
>we're always bottlenecked on memory.
>

VLIW and OoO might seem orthogonal, but they aren't – because they are trying to solve the same problem, combining them either means the OoO engine can't do a very good job because of false dependencies (if you are scheduling molecules) or you have to break them instructions down into atoms, at which point it is just a (often quite inefficient) RISC encoding. In short, VLIW *might* make sense when you are statically scheduling a known pipeline, but it is basically a dead end for evolution – so unless you can JIT your code for each new chip generation...

But OoO still is more powerful, because it can do *dynamic* scheduling. A cache miss doesn't necessarily mean that you have to stop the entire machine, for example.