[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wgkgnyW2V4gQQTDAOKXGZH0fqN=hApz1LFAE3OC3fhhrQ@mail.gmail.com>
Date: Sat, 5 Oct 2024 17:00:01 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: "H. Peter Anvin" <hpa@...or.com>
Cc: Uros Bizjak <ubizjak@...il.com>, Ard Biesheuvel <ardb@...nel.org>,
Ard Biesheuvel <ardb+git@...gle.com>, linux-kernel@...r.kernel.org, x86@...nel.org,
Andy Lutomirski <luto@...nel.org>, Peter Zijlstra <peterz@...radead.org>, Dennis Zhou <dennis@...nel.org>,
Tejun Heo <tj@...nel.org>, Christoph Lameter <cl@...ux.com>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, Paolo Bonzini <pbonzini@...hat.com>,
Vitaly Kuznetsov <vkuznets@...hat.com>, Juergen Gross <jgross@...e.com>,
Boris Ostrovsky <boris.ostrovsky@...cle.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>, Arnd Bergmann <arnd@...db.de>,
Masahiro Yamada <masahiroy@...nel.org>, Kees Cook <kees@...nel.org>,
Nathan Chancellor <nathan@...nel.org>, Keith Packard <keithp@...thp.com>,
Justin Stitt <justinstitt@...gle.com>, Josh Poimboeuf <jpoimboe@...nel.org>,
Arnaldo Carvalho de Melo <acme@...nel.org>, Namhyung Kim <namhyung@...nel.org>, Jiri Olsa <jolsa@...nel.org>,
Ian Rogers <irogers@...gle.com>, Adrian Hunter <adrian.hunter@...el.com>,
Kan Liang <kan.liang@...ux.intel.com>, linux-doc@...r.kernel.org,
linux-pm@...r.kernel.org, kvm@...r.kernel.org, xen-devel@...ts.xenproject.org,
linux-efi@...r.kernel.org, linux-arch@...r.kernel.org,
linux-sparse@...r.kernel.org, linux-kbuild@...r.kernel.org,
linux-perf-users@...r.kernel.org, rust-for-linux@...r.kernel.org,
llvm@...ts.linux.dev
Subject: Re: [RFC PATCH 25/28] x86: Use PIE codegen for the core kernel
On Sat, 5 Oct 2024 at 16:37, H. Peter Anvin <hpa@...or.com> wrote:
>
> Sadly, that is not correct; neither gcc nor clang uses lea:
Looking around, this may be intentional. At least according to Agner,
several cores do better at "mov immediate" compared to "lea".
Eg a RIP-relative LEA on Zen 2 gets a throughput of two per cycle, but
a "MOV r,i" gets four. That got fixed in Zen 3 and later, but
apparently Intel had similar issues (Ivy Bridge: 1 LEA per cycle, vs 3
"mov i,r". Haswell is 1:4).
Of course, Agner's tables are good, but not necessarily always the
whole story. There are other instruction tables on the internet (eg
uops.info) with possibly more info.
And in reality, I would expect it to be a complete non-issue with any
OoO engine and real code, because you are very seldom ALU limited
particularly when there aren't any data dependencies.
But a RIP-relative LEA does seem to put a *bit* more pressure on the
core resources, so the compilers are may be right to pick a "mov".
Linus
Powered by blists - more mailing lists