[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20251201112526.GBaS17JhPrvYGiWv3L@fat_crate.local>
Date: Mon, 1 Dec 2025 12:25:26 +0100
From: Borislav Petkov <bp@...en8.de>
To: Adrian Hunter <adrian.hunter@...el.com>,
Masami Hiramatsu <mhiramat@...nel.org>
Cc: linux-kernel@...r.kernel.org,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Jiri Olsa <jolsa@...hat.com>,
Dan Williams <dan.j.williams@...el.com>,
Ingo Molnar <mingo@...hat.com>, "H. Peter Anvin" <hpa@...or.com>,
Thomas Gleixner <tglx@...utronix.de>,
Andy Lutomirski <luto@...capital.net>, X86 ML <x86@...nel.org>
Subject: Re: [PATCH V2 2/4] x86/insn: Add AVX-512 support to the instruction
decoder
On Sun, Nov 30, 2025 at 05:05:28PM +0100, Borislav Petkov wrote:
> Resurrecting a very old thread...
>
> On Wed, Jul 20, 2016 at 11:30:35AM +0300, Adrian Hunter wrote:
> > Add support for Intel's AVX-512 instructions to the instruction decoder.
> >
> > AVX-512 instructions are documented in Intel Architecture Instruction Set
> > Extensions Programming Reference (February 2016).
> >
> > AVX-512 instructions are identified by a EVEX prefix which, for the purpose
> > of instruction decoding, can be treated as though it were a 4-byte VEX
> > prefix.
> >
> > Existing instructions which can now accept an EVEX prefix need not be
> > further annotated in the op code map (x86-opcode-map.txt). In the case of
> > new instructions, the op code map is updated accordingly.
> >
> > Also add associated Mask Instructions that are used to manipulate mask
> > registers used in AVX-512 instructions.
> >
> > 'perf tools' instruction decoder is updated in a subsequent patch. And a
> > representative set of instructions is added to the perf tools new
> > instructions test in a subsequent patch.
> >
> > Signed-off-by: Adrian Hunter <adrian.hunter@...el.com>
> > ---
> > arch/x86/include/asm/inat.h | 17 ++-
> > arch/x86/include/asm/insn.h | 12 +-
> > arch/x86/lib/insn.c | 18 ++-
> > arch/x86/lib/x86-opcode-map.txt | 263 +++++++++++++++++++++++------------
> > arch/x86/tools/gen-insn-attr-x86.awk | 11 +-
> > 5 files changed, 220 insertions(+), 101 deletions(-)
> >
> > +78: VMREAD Ey,Gy | vcvttps2udq/pd2udq Vx,Wpd (evo) | vcvttsd2usi Gv,Wx (F2),(ev) | vcvttss2usi Gv,Wx (F3),(ev) | vcvttps2uqq/pd2uqq Vx,Wx (66),(ev)
> > +79: VMWRITE Gy,Ey | vcvtps2udq/pd2udq Vx,Wpd (evo) | vcvtsd2usi Gv,Wx (F2),(ev) | vcvtss2usi Gv,Wx (F3),(ev) | vcvtps2uqq/pd2uqq Vx,Wx (66),(ev)
>
> This is all fine and dandy but those (ev*) flags cause the escape table to
> have INAT_EVEXONLY as a flag:
>
> const insn_attr_t inat_escape_table_1_1[INAT_OPCODE_TABLE_SIZE] = {
> ...
>
> [0x79] = INAT_MODRM | INAT_VEXOK | INAT_EVEXONLY,
>
> };
>
> except that that opcode is not EVEX only. Intel's VMREAD and VMWRITE are *not*
> EVEX insns and AMD has there EXTRQ and INSERTQ with prefixes 66 and F2
> respectively which are SSE4a and both are not EVEX.
>
> The VMREAD and VMWRITE decoding happens to work out by pure chance because
> those are without a prefix and the check for prefix id in
> inat_get_escape_attribute() happens to not select that escape table.
>
> So the first thing that comes to mind is excluding opcodes like 0x79 which can
> be mixed type from that inat_must_vex() enforcement...?
>
> Masami, any other ideas?
This hack seems to do the trick. We probably should take a look at all the
insn tables and if there are more opcodes like that, to turn the mixed bool
below into a proper flag:
diff --git a/tools/arch/x86/lib/insn.c b/tools/arch/x86/lib/insn.c
index 1d1c57c74d1f..e3216da11a7c 100644
--- a/tools/arch/x86/lib/insn.c
+++ b/tools/arch/x86/lib/insn.c
@@ -276,6 +276,7 @@ int insn_get_prefixes(struct insn *insn)
int insn_get_opcode(struct insn *insn)
{
struct insn_field *opcode = &insn->opcode;
+ bool mixed = false;
int pfx_id, ret;
insn_byte_t op;
@@ -348,13 +359,25 @@ int insn_get_opcode(struct insn *insn)
while (inat_is_escape(insn->attr)) {
/* Get escaped opcode */
op = get_next(insn_byte_t, insn);
+
opcode->bytes[opcode->nbytes++] = op;
pfx_id = insn_last_prefix_id(insn);
+
+ printf("%s: escaped op: 0x%x, pfx_id (insn table: none, 66, f3, f2): 0x%x, attr: 0x%x\n",
+ __func__, op, pfx_id, insn->attr);
+
insn->attr = inat_get_escape_attribute(op, pfx_id, insn->attr);
+
+ printf("got attr: 0x%x\n", insn->attr);
}
- if (inat_must_vex(insn->attr)) {
+ mixed = (opcode->bytes[0] == 0xf) && (opcode->bytes[1] == 0x79);
+
+ printf("%s: must_vex, mixed: %d\n", __func__, mixed);
+
+ if (inat_must_vex(insn->attr) && !mixed) {
/* This instruction is bad */
+ printf("%s: must_vex bad\n", __func__);
insn->attr = 0;
return -EINVAL;
}
diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt
index 0139b864ceef..d059c8e63bfe 100644
--- a/tools/arch/x86/lib/x86-opcode-map.txt
+++ b/tools/arch/x86/lib/x86-opcode-map.txt
@@ -474,7 +474,7 @@ AVXcode: 1
# Note: Remove (v), because vzeroall and vzeroupper becomes emms without VEX.
77: emms | vzeroupper | vzeroall
78: VMREAD Ey,Gy | vcvttps2udq/pd2udq Vx,Wpd (evo) | vcvttsd2usi Gv,Wx (F2),(ev) | vcvttss2usi Gv,Wx (F3),(ev) | vcvttps2uqq/pd2uqq Vx,Wx (66),(ev)
-79: VMWRITE Gy,Ey | vcvtps2udq/pd2udq Vx,Wpd (evo) | vcvtsd2usi Gv,Wx (F2),(ev) | vcvtss2usi Gv,Wx (F3),(ev) | vcvtps2uqq/pd2uqq Vx,Wx (66),(ev) | EXTRQ
+79: VMWRITE Gy,Ey | EXTRQ Vo,Uo (66) | vcvtps2udq/pd2udq Vx,Wpd (evo) | vcvtsd2usi Gv,Wx (F2),(ev) | vcvtss2usi Gv,Wx (F3),(ev) | vcvtps2uqq/pd2uqq Vx,Wx (66),(ev)
7a: vcvtudq2pd/uqq2pd Vpd,Wx (F3),(ev) | vcvtudq2ps/uqq2ps Vpd,Wx (F2),(ev) | vcvttps2qq/pd2qq Vx,Wx (66),(ev)
7b: vcvtusi2sd Vpd,Hpd,Ev (F2),(ev) | vcvtusi2ss Vps,Hps,Ev (F3),(ev) | vcvtps2qq/pd2qq Vx,Wx (66),(ev)
7c: vhaddpd Vpd,Hpd,Wpd (66) | vhaddps Vps,Hps,Wps (F2)
and insn-sanity decodes properly:
$ echo "66 0f 79 ca" | ./insn_sanity -vvv -i -
Instruction = {
.prefixes = {
.value = 1711276134, bytes[] = {66, 0, 0, 66},
.got = 1, .nbytes = 1},
.rex_prefix = {
.value = 0, bytes[] = {0, 0, 0, 0},
.got = 1, .nbytes = 0},
.vex_prefix = {
.value = 0, bytes[] = {0, 0, 0, 0},
.got = 1, .nbytes = 0},
.opcode = {
.value = 30991, bytes[] = {f, 79, 0, 0},
.got = 1, .nbytes = 2},
.modrm = {
.value = 202, bytes[] = {ca, 0, 0, 0},
.got = 1, .nbytes = 1},
.sib = {
.value = 0, bytes[] = {0, 0, 0, 0},
.got = 1, .nbytes = 0},
.displacement = {
.value = 0, bytes[] = {0, 0, 0, 0},
.got = 1, .nbytes = 0},
.immediate1 = {
.value = 0, bytes[] = {0, 0, 0, 0},
.got = 1, .nbytes = 0},
.immediate2 = {
.value = 0, bytes[] = {0, 0, 0, 0},
.got = 0, .nbytes = 0},
.attr = 508000, .opnd_bytes = 2, .addr_bytes = 4,
.length = 4, .x86_64 = 0, .kaddr = 0x7ffc7d62aed0}
./insn_sanity: success: Decoded and checked 1 given instructions with 0 errors (seed:0x0)
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Powered by blists - more mailing lists