lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-Id: <00786ddb6d626442879072851d455bbd8ef1181d.e1f179e9.b8a1.463b.998f.98948268e9ea@bytedance.com>
Date: Fri, 21 Nov 2025 16:26:19 +0800
From: "Zhanpeng Zhang" <zhangzhanpeng.jasper@...edance.com>
To: <cleger@...osinc.com>
Cc: "Paul Walmsley" <paul.walmsley@...ive.com>, 
	"Palmer Dabbelt" <palmer@...belt.com>, 
	"linux-riscv@...ts.infradead.org" <linux-riscv@...ts.infradead.org>, 
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, 
	"linux-arm-kernel@...ts.infradead.org" <linux-arm-kernel@...ts.infradead.org>, 
	"Himanshu Chauhan" <hchauhan@...tanamicro.com>, 
	"Anup Patel" <apatel@...tanamicro.com>, 
	路旭 <luxu.kernel@...edance.com>, 
	"Atish Patra" <atishp@...shpatra.org>, 
	Björn Töpel <bjorn@...osinc.com>, 
	崔运辉 <cuiyunhui@...edance.com>, 
	元竹 <yuanzhu@...edance.com>
Subject: SSE May Corrupt KVM's Context

Hi Clément, 

I encountered another SSE problem recently. Host SSE events may affect the guest. Neither PMU-SBI-SSE or KVM-SSE is enabled. Just using the simplest riscv_sse_test to trigger ecall SSE, and it will affect the KVM vcpu running in the background.

The following log is the output of KVM and QEMU when vcpu crashes:

[  152.228548] kvm [145]: VCPU exit error -14
[  152.230664] kvm [145]: SEPC=0xffffffff80040280 SSTATUS=0x200004500 HSTATUS=0x200201100
[  152.231868] kvm [145]: SCAUSE=0xf STVAL=0x3c8 HTVAL=0x0 HTINST=0x103023
error: kvm run failed Bad address
 pc       ffffffff80040280
 mhartid  0000000000000000
 mstatus  0000000200000100
 mip      0000000000000000
 mie      0000000000000000
 mideleg  0000000000000000
 medeleg  0000000000000000
 mtvec    0000000000000000
 mepc     0000000000000000
 mcause   0000000000000000
 mtval    0000000000000000
 mscratch 0000000000000000
 x0/zero  0000000000000000 x1/ra    ffffffff80c1ac2c x2/sp    ffffffff81e03cf0 x3/gp    ffffffff8201aa68
 x4/tp    ffffffff81e0e0c0 x5/t0    ffffffff80b91a08 x6/t1    ffffaf80fee00000 x7/t2    0000000200000020
 x8/s0    ffffffff81e03d50 x9/s1    0000000001000000 x10/a0   0000000000000000 x11/a1   0000000000000000
 x12/a2   0000000001000000 x13/a3   ffffaf80ffe00000 x14/a4   0000000000000000 x15/a5   ffffaf7f80000000
 x16/a6   0000000000000002 x17/a7   0000000000000002 x18/s2   ffffaf80fee00000 x19/s3   ffffffff81639170
 x20/s4   ffffffff820200f8 x21/s5   0000000000000000 x22/s6   0000000000000000 x23/s7   0000000000000000
 x24/s8   0000000000000000 x25/s9   0000000000000000 x26/s10  0000000000000000 x27/s11  0000000000000000
 x28/t3   ffffffff80b85110 x29/t4   ffffffff8205d258 x30/t5   ffffffff8205d258 x31/t6   ffffffff8205d2a0
[  152.312824] riscv_sse_test: FAILED: Failed to wait for event local_software_injected completion on CPU 2
[  152.317933] riscv_sse_test: FAILED: Received SSE event -65536 on CPU 2 instead of 4

The following is the assembly context where vcpu crashes, which is the SAVE_GUEST_GPRS macro that runs when it get back to host(vcpu_switch.S: Lkvm_switch_return).
The sscratch is set to 0(or other bad and low addresses) by mistake, causing a crash when storing the virtual machine context.

   0xffffffff8004027c <+252>:   csrrw   a0,sscratch,a0
   0xffffffff80040280 <+256>:   sd      ra,968(a0)
   0xffffffff80040284 <+260>:   sd      sp,976(a0)
   0xffffffff80040288 <+264>:   sd      gp,984(a0)
   0xffffffff8004028c <+268>:   sd      tp,992(a0)
   0xffffffff80040290 <+272>:   sd      t0,1000(a0)
   0xffffffff80040294 <+276>:   sd      t1,1008(a0)

This is a must-occur problem in my environment. As long as an idle VM is running in the background,  run the SSE test in loop on the host, and the vcpu will crash. Can this problem be reproduced in your environment?

Regards,
Zhanpeng

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ