[<prev] [next>] [day] [month] [year] [list]
Message-Id: <00786ddb6d626442879072851d455bbd8ef1181d.e1f179e9.b8a1.463b.998f.98948268e9ea@bytedance.com>
Date: Fri, 21 Nov 2025 16:26:19 +0800
From: "Zhanpeng Zhang" <zhangzhanpeng.jasper@...edance.com>
To: <cleger@...osinc.com>
Cc: "Paul Walmsley" <paul.walmsley@...ive.com>,
"Palmer Dabbelt" <palmer@...belt.com>,
"linux-riscv@...ts.infradead.org" <linux-riscv@...ts.infradead.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-arm-kernel@...ts.infradead.org" <linux-arm-kernel@...ts.infradead.org>,
"Himanshu Chauhan" <hchauhan@...tanamicro.com>,
"Anup Patel" <apatel@...tanamicro.com>,
路旭 <luxu.kernel@...edance.com>,
"Atish Patra" <atishp@...shpatra.org>,
Björn Töpel <bjorn@...osinc.com>,
崔运辉 <cuiyunhui@...edance.com>,
元竹 <yuanzhu@...edance.com>
Subject: SSE May Corrupt KVM's Context
Hi Clément,
I encountered another SSE problem recently. Host SSE events may affect the guest. Neither PMU-SBI-SSE or KVM-SSE is enabled. Just using the simplest riscv_sse_test to trigger ecall SSE, and it will affect the KVM vcpu running in the background.
The following log is the output of KVM and QEMU when vcpu crashes:
[ 152.228548] kvm [145]: VCPU exit error -14
[ 152.230664] kvm [145]: SEPC=0xffffffff80040280 SSTATUS=0x200004500 HSTATUS=0x200201100
[ 152.231868] kvm [145]: SCAUSE=0xf STVAL=0x3c8 HTVAL=0x0 HTINST=0x103023
error: kvm run failed Bad address
pc ffffffff80040280
mhartid 0000000000000000
mstatus 0000000200000100
mip 0000000000000000
mie 0000000000000000
mideleg 0000000000000000
medeleg 0000000000000000
mtvec 0000000000000000
mepc 0000000000000000
mcause 0000000000000000
mtval 0000000000000000
mscratch 0000000000000000
x0/zero 0000000000000000 x1/ra ffffffff80c1ac2c x2/sp ffffffff81e03cf0 x3/gp ffffffff8201aa68
x4/tp ffffffff81e0e0c0 x5/t0 ffffffff80b91a08 x6/t1 ffffaf80fee00000 x7/t2 0000000200000020
x8/s0 ffffffff81e03d50 x9/s1 0000000001000000 x10/a0 0000000000000000 x11/a1 0000000000000000
x12/a2 0000000001000000 x13/a3 ffffaf80ffe00000 x14/a4 0000000000000000 x15/a5 ffffaf7f80000000
x16/a6 0000000000000002 x17/a7 0000000000000002 x18/s2 ffffaf80fee00000 x19/s3 ffffffff81639170
x20/s4 ffffffff820200f8 x21/s5 0000000000000000 x22/s6 0000000000000000 x23/s7 0000000000000000
x24/s8 0000000000000000 x25/s9 0000000000000000 x26/s10 0000000000000000 x27/s11 0000000000000000
x28/t3 ffffffff80b85110 x29/t4 ffffffff8205d258 x30/t5 ffffffff8205d258 x31/t6 ffffffff8205d2a0
[ 152.312824] riscv_sse_test: FAILED: Failed to wait for event local_software_injected completion on CPU 2
[ 152.317933] riscv_sse_test: FAILED: Received SSE event -65536 on CPU 2 instead of 4
The following is the assembly context where vcpu crashes, which is the SAVE_GUEST_GPRS macro that runs when it get back to host(vcpu_switch.S: Lkvm_switch_return).
The sscratch is set to 0(or other bad and low addresses) by mistake, causing a crash when storing the virtual machine context.
0xffffffff8004027c <+252>: csrrw a0,sscratch,a0
0xffffffff80040280 <+256>: sd ra,968(a0)
0xffffffff80040284 <+260>: sd sp,976(a0)
0xffffffff80040288 <+264>: sd gp,984(a0)
0xffffffff8004028c <+268>: sd tp,992(a0)
0xffffffff80040290 <+272>: sd t0,1000(a0)
0xffffffff80040294 <+276>: sd t1,1008(a0)
This is a must-occur problem in my environment. As long as an idle VM is running in the background, run the SSE test in loop on the host, and the vcpu will crash. Can this problem be reproduced in your environment?
Regards,
Zhanpeng
Powered by blists - more mailing lists