[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9e1a8d4f-251f-f78e-01a3-5c483249fac8@loongson.cn>
Date: Mon, 29 Dec 2025 11:53:13 +0800
From: lixianglai <lixianglai@...ngson.cn>
To: Jinyang He <hejinyang@...ngson.cn>, loongarch@...ts.linux.dev,
linux-kernel@...r.kernel.org, kvm@...r.kernel.org
Cc: stable@...r.kernel.org, Huacai Chen <chenhuacai@...nel.org>,
WANG Xuerui <kernel@...0n.name>, Tianrui Zhao <zhaotianrui@...ngson.cn>,
Bibo Mao <maobibo@...ngson.cn>, Charlie Jenkins <charlie@...osinc.com>,
Thomas Gleixner <tglx@...utronix.de>, Tiezhu Yang <yangtiezhu@...ngson.cn>
Subject: Re: [PATCH V3 2/2] LoongArch: KVM: fix "unreliable stack" issue
Hi Jinyang:
> On 2025-12-27 09:27, Xianglai Li wrote:
>
>> Insert the appropriate UNWIND macro definition into the kvm_exc_entry in
>> the assembly function to guide the generation of correct ORC table
>> entries,
>> thereby solving the timeout problem of loading the livepatch-sample
>> module
>> on a physical machine running multiple vcpus virtual machines.
>>
>> While solving the above problems, we have gained an additional benefit,
>> that is, we can obtain more call stack information
>>
>> Stack information that can be obtained before the problem is fixed:
>> [<0>] kvm_vcpu_block+0x88/0x120 [kvm]
>> [<0>] kvm_vcpu_halt+0x68/0x580 [kvm]
>> [<0>] kvm_emu_idle+0xd4/0xf0 [kvm]
>> [<0>] kvm_handle_gspr+0x7c/0x700 [kvm]
>> [<0>] kvm_handle_exit+0x160/0x270 [kvm]
>> [<0>] kvm_exc_entry+0x100/0x1e0
>>
>> Stack information that can be obtained after the problem is fixed:
>> [<0>] kvm_vcpu_block+0x88/0x120 [kvm]
>> [<0>] kvm_vcpu_halt+0x68/0x580 [kvm]
>> [<0>] kvm_emu_idle+0xd4/0xf0 [kvm]
>> [<0>] kvm_handle_gspr+0x7c/0x700 [kvm]
>> [<0>] kvm_handle_exit+0x160/0x270 [kvm]
>> [<0>] kvm_exc_entry+0x104/0x1e4
>> [<0>] kvm_enter_guest+0x38/0x11c
>> [<0>] kvm_arch_vcpu_ioctl_run+0x26c/0x498 [kvm]
>> [<0>] kvm_vcpu_ioctl+0x200/0xcf8 [kvm]
>> [<0>] sys_ioctl+0x498/0xf00
>> [<0>] do_syscall+0x98/0x1d0
>> [<0>] handle_syscall+0xb8/0x158
>>
>> Cc: stable@...r.kernel.org
>> Signed-off-by: Xianglai Li <lixianglai@...ngson.cn>
>> ---
>> Cc: Huacai Chen <chenhuacai@...nel.org>
>> Cc: WANG Xuerui <kernel@...0n.name>
>> Cc: Tianrui Zhao <zhaotianrui@...ngson.cn>
>> Cc: Bibo Mao <maobibo@...ngson.cn>
>> Cc: Charlie Jenkins <charlie@...osinc.com>
>> Cc: Xianglai Li <lixianglai@...ngson.cn>
>> Cc: Thomas Gleixner <tglx@...utronix.de>
>> Cc: Tiezhu Yang <yangtiezhu@...ngson.cn>
>>
>> arch/loongarch/kvm/switch.S | 28 +++++++++++++++++++---------
>> 1 file changed, 19 insertions(+), 9 deletions(-)
>>
>> diff --git a/arch/loongarch/kvm/switch.S b/arch/loongarch/kvm/switch.S
>> index 93845ce53651..a3ea9567dbe5 100644
>> --- a/arch/loongarch/kvm/switch.S
>> +++ b/arch/loongarch/kvm/switch.S
>> @@ -10,6 +10,7 @@
>> #include <asm/loongarch.h>
>> #include <asm/regdef.h>
>> #include <asm/unwind_hints.h>
>> +#include <linux/kvm_types.h>
>> #define HGPR_OFFSET(x) (PT_R0 + 8*x)
>> #define GGPR_OFFSET(x) (KVM_ARCH_GGPR + 8*x)
>> @@ -110,9 +111,9 @@
>> * need to copy world switch code to DMW area.
>> */
>> .text
>> + .p2align PAGE_SHIFT
>> .cfi_sections .debug_frame
>> SYM_CODE_START(kvm_exc_entry)
>> - .p2align PAGE_SHIFT
>> UNWIND_HINT_UNDEFINED
>> csrwr a2, KVM_TEMP_KS
>> csrrd a2, KVM_VCPU_KS
>> @@ -170,6 +171,7 @@ SYM_CODE_START(kvm_exc_entry)
>> /* restore per cpu register */
>> ld.d u0, a2, KVM_ARCH_HPERCPU
>> addi.d sp, sp, -PT_SIZE
>> + UNWIND_HINT_REGS
>> /* Prepare handle exception */
>> or a0, s0, zero
>> @@ -200,7 +202,7 @@ ret_to_host:
>> jr ra
>> SYM_CODE_END(kvm_exc_entry)
>> -EXPORT_SYMBOL(kvm_exc_entry)
>> +EXPORT_SYMBOL_FOR_KVM(kvm_exc_entry)
>> /*
>> * int kvm_enter_guest(struct kvm_run *run, struct kvm_vcpu *vcpu)
>> @@ -215,6 +217,14 @@ SYM_FUNC_START(kvm_enter_guest)
>> /* Save host GPRs */
>> kvm_save_host_gpr a2
>> + /*
>> + * The csr_era member variable of the pt_regs structure is required
>> + * for unwinding orc to perform stack traceback, so we need to put
>> + * pc into csr_era member variable here.
>> + */
>> + pcaddi t0, 0
>> + st.d t0, a2, PT_ERA
> Hi, Xianglai,
>
> It should use `SYM_CODE_START` to mark the `kvm_enter_guest` rather than
> `SYM_FUNC_START`, since the `SYM_FUNC_START` is used to mark "C-likely"
> asm functionw.
Ok, I will use SYM_CODE_START to mark kvm_enter_guest in the next version.
> I guess the kvm_enter_guest is something like exception
> handler becuase the last instruction is "ertn". So usually it should
> mark UNWIND_HINT_REGS where can find last frame info by "$sp".
> However, all info is store to "$a2", this mark should be
> `UNWIND_HINT sp_reg=ORC_REG_A2(???) type=UNWIND_HINT_TYPE_REGS`.
> I don't konw why save this function internal PC here by `pcaddi t0, 0`,
> and I think it is no meaning(, for exception handler, they save last PC
> by read CSR.ERA). The `kvm_enter_guest` saves registers by
> "$a2"("$sp" - PT_REGS) beyond stack ("$sp"), it is dangerous if IE
> is enable. So I wonder if there is really a stacktrace through this
> function?
>
The stack backtracking issue in switch.S is rather complex because it
involves the switching between cpu root-mode and guest-mode:
Real stack backtracking should be divided into two parts:
part 1:
[<0>] kvm_enter_guest+0x38/0x11c
[<0>] kvm_arch_vcpu_ioctl_run+0x26c/0x498 [kvm]
[<0>] kvm_vcpu_ioctl+0x200/0xcf8 [kvm]
[<0>] sys_ioctl+0x498/0xf00
[<0>] do_syscall+0x98/0x1d0
[<0>] handle_syscall+0xb8/0x158
part 2:
[<0>] kvm_vcpu_block+0x88/0x120 [kvm]
[<0>] kvm_vcpu_halt+0x68/0x580 [kvm]
[<0>] kvm_emu_idle+0xd4/0xf0 [kvm]
[<0>] kvm_handle_gspr+0x7c/0x700 [kvm]
[<0>] kvm_handle_exit+0x160/0x270 [kvm]
[<0>] kvm_exc_entry+0x104/0x1e4
In "part 1", after executing kvm_enter_guest, the cpu switches from
root-mode to guest-mode.
In this case, stack backtracking is indeed very rare.
In "part 2", the cpu switches from the guest-mode to the root-mode,
and most of the stack backtracking occurs during this phase.
To obtain the longest call chain, we save pc in kvm_enter_guest to
pt_regs.csr_era,
and after restoring the sp of the root-mode cpu in kvm_exc_entry,
The ORC entry was re-established using "UNWIND_HINT_REGS",
and then we obtained the following stack backtrace as we wanted:
[<0>] kvm_vcpu_block+0x88/0x120 [kvm]
[<0>] kvm_vcpu_halt+0x68/0x580 [kvm]
[<0>] kvm_emu_idle+0xd4/0xf0 [kvm]
[<0>] kvm_handle_gspr+0x7c/0x700 [kvm]
[<0>] kvm_handle_exit+0x160/0x270 [kvm]
[<0>] kvm_exc_entry+0x104/0x1e4
[<0>] kvm_enter_guest+0x38/0x11c
[<0>] kvm_arch_vcpu_ioctl_run+0x26c/0x498 [kvm]
[<0>] kvm_vcpu_ioctl+0x200/0xcf8 [kvm]
[<0>] sys_ioctl+0x498/0xf00
[<0>] do_syscall+0x98/0x1d0
[<0>] handle_syscall+0xb8/0x158
Doing so is equivalent to ignoring the details of the cpu root-mode and
guest-mode switching.
About what you said in the IE enable phase is dangerous,
interrupts are always off during the cpu root-mode and guest-mode
switching in kvm_enter_guest and kvm_exc_entry.
Thanks!
Xianglai.
> Jinyang
>
>
>> +
>> addi.d a2, a1, KVM_VCPU_ARCH
>> st.d sp, a2, KVM_ARCH_HSP
>> st.d tp, a2, KVM_ARCH_HTP
>> @@ -225,7 +235,7 @@ SYM_FUNC_START(kvm_enter_guest)
>> csrwr a1, KVM_VCPU_KS
>> kvm_switch_to_guest
>> SYM_FUNC_END(kvm_enter_guest)
>> -EXPORT_SYMBOL(kvm_enter_guest)
>> +EXPORT_SYMBOL_FOR_KVM(kvm_enter_guest)
>> SYM_FUNC_START(kvm_save_fpu)
>> fpu_save_csr a0 t1
>> @@ -233,7 +243,7 @@ SYM_FUNC_START(kvm_save_fpu)
>> fpu_save_cc a0 t1 t2
>> jr ra
>> SYM_FUNC_END(kvm_save_fpu)
>> -EXPORT_SYMBOL(kvm_save_fpu)
>> +EXPORT_SYMBOL_FOR_KVM(kvm_save_fpu)
>> SYM_FUNC_START(kvm_restore_fpu)
>> fpu_restore_double a0 t1
>> @@ -241,7 +251,7 @@ SYM_FUNC_START(kvm_restore_fpu)
>> fpu_restore_cc a0 t1 t2
>> jr ra
>> SYM_FUNC_END(kvm_restore_fpu)
>> -EXPORT_SYMBOL(kvm_restore_fpu)
>> +EXPORT_SYMBOL_FOR_KVM(kvm_restore_fpu)
>> #ifdef CONFIG_CPU_HAS_LSX
>> SYM_FUNC_START(kvm_save_lsx)
>> @@ -250,7 +260,7 @@ SYM_FUNC_START(kvm_save_lsx)
>> lsx_save_data a0 t1
>> jr ra
>> SYM_FUNC_END(kvm_save_lsx)
>> -EXPORT_SYMBOL(kvm_save_lsx)
>> +EXPORT_SYMBOL_FOR_KVM(kvm_save_lsx)
>> SYM_FUNC_START(kvm_restore_lsx)
>> lsx_restore_data a0 t1
>> @@ -258,7 +268,7 @@ SYM_FUNC_START(kvm_restore_lsx)
>> fpu_restore_csr a0 t1 t2
>> jr ra
>> SYM_FUNC_END(kvm_restore_lsx)
>> -EXPORT_SYMBOL(kvm_restore_lsx)
>> +EXPORT_SYMBOL_FOR_KVM(kvm_restore_lsx)
>> #endif
>> #ifdef CONFIG_CPU_HAS_LASX
>> @@ -268,7 +278,7 @@ SYM_FUNC_START(kvm_save_lasx)
>> lasx_save_data a0 t1
>> jr ra
>> SYM_FUNC_END(kvm_save_lasx)
>> -EXPORT_SYMBOL(kvm_save_lasx)
>> +EXPORT_SYMBOL_FOR_KVM(kvm_save_lasx)
>> SYM_FUNC_START(kvm_restore_lasx)
>> lasx_restore_data a0 t1
>> @@ -276,7 +286,7 @@ SYM_FUNC_START(kvm_restore_lasx)
>> fpu_restore_csr a0 t1 t2
>> jr ra
>> SYM_FUNC_END(kvm_restore_lasx)
>> -EXPORT_SYMBOL(kvm_restore_lasx)
>> +EXPORT_SYMBOL_FOR_KVM(kvm_restore_lasx)
>> #endif
>> #ifdef CONFIG_CPU_HAS_LBT
Powered by blists - more mailing lists