linux-kernel - Re: [PATCH 1/3] x86/umip: Check that the instruction opcode is at least two bytes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250919181635.GFaM2eA8GFtZ48sfb0@fat_crate.local>
Date: Fri, 19 Sep 2025 20:16:35 +0200
From: Borislav Petkov <bp@...en8.de>
To: Sean Christopherson <seanjc@...gle.com>,
	Masami Hiramatsu <mhiramat@...nel.org>
Cc: Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
	Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
	linux-kernel@...r.kernel.org, Dan Snyder <dansnyder@...gle.com>
Subject: Re: [PATCH 1/3] x86/umip: Check that the instruction opcode is at
 least two bytes

On Fri, Aug 08, 2025 at 10:23:56AM -0700, Sean Christopherson wrote:
> When checking for a potential UMIP violation on #GP, verify the decoder
> found at least two opcode bytes to avoid false positives when the kernel
> encounters an unknown instruction that starts with 0f.  Because the array
> of opcode.bytes is zero-initialized by insn_init(), peeking at bytes[1]
> will misinterpret garbage as a potential SLDT or STR instruction, and can
> incorrectly trigger emulation.
> 
> E.g. if a vpalignr instruction
> 
>    62 83 c5 05 0f 08 ff     vpalignr xmm17{k5},xmm23,XMMWORD PTR [r8],0xff
> 
> hits a #GP, the kernel emulates it as STR and squashes the #GP (and
> corrupts the userspace code stream).
> 
> Arguably the check should look for exactly two bytes, but no three byte
> opcodes use '0f 00 xx' or '0f 01 xx' as an escape, i.e. it should be
> impossible to get a false positive if the first two opcode bytes match
> '0f 00' or '0f 01'.  Go with a more conservative check with respect to the
> existing code to minimize the chances of breaking userspace, e.g. due to
> decoder weirdness.

So I did some staring... I guess this fix is trying to address our insn
decoder shortcoming and calls it "weirdness", right?

$ objdump -d a.out | awk -f ./arch/x86/tools/objdump_reformat.awk | ./arch/x86/tools/insn_decoder_test 
./arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder bug, please report this.
./arch/x86/tools/insn_decoder_test: warning:    0:      62 83 c5 05 0f 08 ff    vpalignr $0xff,(%r8),%xmm23,%xmm17{%k5}
./arch/x86/tools/insn_decoder_test: warning: objdump says 7 bytes, but insn_get_length() says 6
./arch/x86/tools/insn_decoder_test: warning: Decoded and checked 1 instructions with 1 failures

Looks like it.

a.out has:

0000000000000000 <.text>:
   0:   62 83 c5 05 0f 08 ff    vpalignr $0xff,(%r8),%xmm23,%xmm17{%k5}

I guess just adding the insn to the table doesn't fix it.

Masami?

---
diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index 262f7ca1fb95..a23ff3c16908 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -23,7 +23,7 @@
 #
 # AVX Superscripts
 #  (ev): this opcode requires EVEX prefix.
-#  (es): this opcode requires EVEX prefix and is SCALABALE.
+#  (es): this opcode requires EVEX prefix and is SCALABLE.
 #  (evo): this opcode is changed by EVEX prefix (EVEX opcode)
 #  (v): this opcode requires VEX prefix.
 #  (v1): this opcode only supports 128bit VEX.
@@ -867,7 +867,7 @@ AVXcode: 3
 0c: vblendps Vx,Hx,Wx,Ib (66)
 0d: vblendpd Vx,Hx,Wx,Ib (66)
 0e: vpblendw Vx,Hx,Wx,Ib (66),(v1)
-0f: palignr Pq,Qq,Ib | vpalignr Vx,Hx,Wx,Ib (66),(v1)
+0f: palignr Pq,Qq,Ib | vpalignr Vx,Hx,Wx,Ib (66),(v1) | vpalignr Vx,kz,Hx,Wx,Ib (ev)
 14: vpextrb Rd/Mb,Vdq,Ib (66),(v1)
 15: vpextrw Rd/Mw,Vdq,Ib (66),(v1)
 16: vpextrd/q Ey,Vdq,Ib (66),(v1)


> Fixes: 1e5db223696a ("x86/umip: Add emulation code for UMIP instructions")
> Reported-by: Dan Snyder <dansnyder@...gle.com>
> Analyzed-by; Nick Bray <ncbray@...gle.com>
> Cc: stable@...r.kernel.org
> Signed-off-by: Sean Christopherson <seanjc@...gle.com>
> ---
>  arch/x86/kernel/umip.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
> index 5a4b21389b1d..406ac01ce16d 100644
> --- a/arch/x86/kernel/umip.c
> +++ b/arch/x86/kernel/umip.c
> @@ -156,8 +156,8 @@ static int identify_insn(struct insn *insn)
>  	if (!insn->modrm.nbytes)
>  		return -EINVAL;
>  
> -	/* All the instructions of interest start with 0x0f. */
> -	if (insn->opcode.bytes[0] != 0xf)
> +	/* The instructions of interest have 2-byte opcodes: 0F 00 or 0F 01. */
> +	if (insn->opcode.nbytes < 2 || insn->opcode.bytes[0] != 0xf)
>  		return -EINVAL;
>  
>  	if (insn->opcode.bytes[1] == 0x1) {
> -- 
> 2.50.1.703.g449372360f-goog
> 

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette