[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <okfluptw2nqnxzcqhgbjz6ap7z5fgxfjv3ukh4rqd3bkadi326@btn45hcawpkt>
Date: Sat, 22 Feb 2025 17:46:32 -0500
From: Kent Overstreet <kent.overstreet@...ux.dev>
To: David Laight <david.laight.linux@...il.com>
Cc: "H. Peter Anvin" <hpa@...or.com>,
Linus Torvalds <torvalds@...ux-foundation.org>, Ventura Jack <venturajack85@...il.com>,
Gary Guo <gary@...yguo.net>, airlied@...il.com, boqun.feng@...il.com, ej@...i.de,
gregkh@...uxfoundation.org, hch@...radead.org, ksummit@...ts.linux.dev,
linux-kernel@...r.kernel.org, miguel.ojeda.sandonis@...il.com, rust-for-linux@...r.kernel.org
Subject: Re: C aggregate passing (Rust kernel policy)
On Sat, Feb 22, 2025 at 10:12:48PM +0000, David Laight wrote:
> On Sat, 22 Feb 2025 16:22:08 -0500
> Kent Overstreet <kent.overstreet@...ux.dev> wrote:
>
> > On Sat, Feb 22, 2025 at 12:54:31PM -0800, H. Peter Anvin wrote:
> > > VLIW and OoO might seem orthogonal, but they aren't – because they are
> > > trying to solve the same problem, combining them either means the OoO
> > > engine can't do a very good job because of false dependencies (if you
> > > are scheduling molecules) or you have to break them instructions down
> > > into atoms, at which point it is just a (often quite inefficient) RISC
> > > encoding. In short, VLIW *might* make sense when you are statically
> > > scheduling a known pipeline, but it is basically a dead end for
> > > evolution – so unless you can JIT your code for each new chip
> > > generation...
> >
> > JITing for each chip generation would be a part of any serious new VLIW
> > effort. It's plenty doable in the open source world and the gains are
> > too big to ignore.
>
> Doesn't most code get 'dumbed down' to whatever 'normal' ABI compilers
> can easily handle.
> A few hot loops might get optimised, but most code won't be.
> Of course AI/GPU code is going to spend a lot of time in some tight loops.
> But no one is going to go through the TCP stack and optimise the source
> so that a compiler can make a better job of it for 'this years' cpu.
We're not actually talking about the normal sort of JIT, nothing profile
guided and no dynamic recompilation - just specialization based on the
exact microarchitecture you're running on.
You'd probably do it by deferring the last stage of compilation and
plugging it into the dynamic linker with an on disk cache - so it can
work with the LLVM toolchain and all the languages that target it.
Powered by blists - more mailing lists