[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210312113253.305040674@infradead.org>
Date: Fri, 12 Mar 2021 12:32:53 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: x86@...nel.org, rostedt@...dmis.org, hpa@...or.com,
torvalds@...uxfoundation.org
Cc: linux-kernel@...r.kernel.org, linux-toolchains@...r.kernel.org,
peterz@...radead.org, jpoimboe@...hat.com,
alexei.starovoitov@...il.com, mhiramat@...nel.org
Subject: [PATCH 0/2] x86: Remove ideal_nops[]
Hi!
A while ago Steve complained about x86 being weird for having different NOPs [1]
Having cursed the same thing before, I figured it was time to look at the NOP
situation.
32bit simply isn't a performance target anymore, so all we need is a set of
NOPs that works on all.
x86_64 has two main NOP variants, NOPL and prefix NOP. NOPL was introduced by
P6 and is architecturally mandated for x86_64. However, some uarchs made the
choice to limit NOPL decoding to a single port, which obviously limits NOPL
throughput. Other uarchs have (severe) decoding penalties for excessive (>~3)
prefixes, hobbling prefix NOP throughput.
But the thing is, all the modern uarchs can handle both without issue; that is
AMD K10 (2007) and later and Intel Ivy Bridge (2012) and later. The only
exception is Atom, which has the prefix penalty.
Since ultimate performance of a 10 year old chip (Intel Sandy Bridge, 2011) is
simply irrelevant today, remove variable NOPs and use NOPL.
This gives us deterministic NOPs and restores sanity.
[1] https://lkml.kernel.org/r/20210302105827.3403656c@gandalf.local.home
Powered by blists - more mailing lists