[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <39ee51e4-6073-6f00-44f5-46bbcbd393c2@loongson.cn>
Date: Tue, 30 Dec 2025 09:18:15 +0800
From: Bibo Mao <maobibo@...ngson.cn>
To: Jinyang He <hejinyang@...ngson.cn>, lixianglai <lixianglai@...ngson.cn>,
Huacai Chen <chenhuacai@...nel.org>
Cc: loongarch@...ts.linux.dev, linux-kernel@...r.kernel.org,
kvm@...r.kernel.org, stable@...r.kernel.org, WANG Xuerui
<kernel@...0n.name>, Tianrui Zhao <zhaotianrui@...ngson.cn>,
Charlie Jenkins <charlie@...osinc.com>, Thomas Gleixner
<tglx@...utronix.de>, Tiezhu Yang <yangtiezhu@...ngson.cn>
Subject: Re: [PATCH V3 2/2] LoongArch: KVM: fix "unreliable stack" issue
On 2025/12/29 下午6:41, Jinyang He wrote:
> On 2025-12-29 18:11, lixianglai wrote:
>
>> Hi Jinyang:
>>>
>>> On 2025-12-29 11:53, lixianglai wrote:
>>>> Hi Jinyang:
>>>>> On 2025-12-27 09:27, Xianglai Li wrote:
>>>>>
>>>>>> Insert the appropriate UNWIND macro definition into the
>>>>>> kvm_exc_entry in
>>>>>> the assembly function to guide the generation of correct ORC table
>>>>>> entries,
>>>>>> thereby solving the timeout problem of loading the
>>>>>> livepatch-sample module
>>>>>> on a physical machine running multiple vcpus virtual machines.
>>>>>>
>>>>>> While solving the above problems, we have gained an additional
>>>>>> benefit,
>>>>>> that is, we can obtain more call stack information
>>>>>>
>>>>>> Stack information that can be obtained before the problem is fixed:
>>>>>> [<0>] kvm_vcpu_block+0x88/0x120 [kvm]
>>>>>> [<0>] kvm_vcpu_halt+0x68/0x580 [kvm]
>>>>>> [<0>] kvm_emu_idle+0xd4/0xf0 [kvm]
>>>>>> [<0>] kvm_handle_gspr+0x7c/0x700 [kvm]
>>>>>> [<0>] kvm_handle_exit+0x160/0x270 [kvm]
>>>>>> [<0>] kvm_exc_entry+0x100/0x1e0
>>>>>>
>>>>>> Stack information that can be obtained after the problem is fixed:
>>>>>> [<0>] kvm_vcpu_block+0x88/0x120 [kvm]
>>>>>> [<0>] kvm_vcpu_halt+0x68/0x580 [kvm]
>>>>>> [<0>] kvm_emu_idle+0xd4/0xf0 [kvm]
>>>>>> [<0>] kvm_handle_gspr+0x7c/0x700 [kvm]
>>>>>> [<0>] kvm_handle_exit+0x160/0x270 [kvm]
>>>>>> [<0>] kvm_exc_entry+0x104/0x1e4
>>>>>> [<0>] kvm_enter_guest+0x38/0x11c
>>>>>> [<0>] kvm_arch_vcpu_ioctl_run+0x26c/0x498 [kvm]
>>>>>> [<0>] kvm_vcpu_ioctl+0x200/0xcf8 [kvm]
>>>>>> [<0>] sys_ioctl+0x498/0xf00
>>>>>> [<0>] do_syscall+0x98/0x1d0
>>>>>> [<0>] handle_syscall+0xb8/0x158
>>>>>>
>>>>>> Cc: stable@...r.kernel.org
>>>>>> Signed-off-by: Xianglai Li <lixianglai@...ngson.cn>
>>>>>> ---
>>>>>> Cc: Huacai Chen <chenhuacai@...nel.org>
>>>>>> Cc: WANG Xuerui <kernel@...0n.name>
>>>>>> Cc: Tianrui Zhao <zhaotianrui@...ngson.cn>
>>>>>> Cc: Bibo Mao <maobibo@...ngson.cn>
>>>>>> Cc: Charlie Jenkins <charlie@...osinc.com>
>>>>>> Cc: Xianglai Li <lixianglai@...ngson.cn>
>>>>>> Cc: Thomas Gleixner <tglx@...utronix.de>
>>>>>> Cc: Tiezhu Yang <yangtiezhu@...ngson.cn>
>>>>>>
>>>>>> arch/loongarch/kvm/switch.S | 28 +++++++++++++++++++---------
>>>>>> 1 file changed, 19 insertions(+), 9 deletions(-)
>>>>>>
>>>>>> diff --git a/arch/loongarch/kvm/switch.S
>>>>>> b/arch/loongarch/kvm/switch.S
>>>>>> index 93845ce53651..a3ea9567dbe5 100644
>>>>>> --- a/arch/loongarch/kvm/switch.S
>>>>>> +++ b/arch/loongarch/kvm/switch.S
>>>>>> @@ -10,6 +10,7 @@
>>>>>> #include <asm/loongarch.h>
>>>>>> #include <asm/regdef.h>
>>>>>> #include <asm/unwind_hints.h>
>>>>>> +#include <linux/kvm_types.h>
>>>>>> #define HGPR_OFFSET(x) (PT_R0 + 8*x)
>>>>>> #define GGPR_OFFSET(x) (KVM_ARCH_GGPR + 8*x)
>>>>>> @@ -110,9 +111,9 @@
>>>>>> * need to copy world switch code to DMW area.
>>>>>> */
>>>>>> .text
>>>>>> + .p2align PAGE_SHIFT
>>>>>> .cfi_sections .debug_frame
>>>>>> SYM_CODE_START(kvm_exc_entry)
>>>>>> - .p2align PAGE_SHIFT
>>>>>> UNWIND_HINT_UNDEFINED
>>>>>> csrwr a2, KVM_TEMP_KS
>>>>>> csrrd a2, KVM_VCPU_KS
>>>>>> @@ -170,6 +171,7 @@ SYM_CODE_START(kvm_exc_entry)
>>>>>> /* restore per cpu register */
>>>>>> ld.d u0, a2, KVM_ARCH_HPERCPU
>>>>>> addi.d sp, sp, -PT_SIZE
>>>>>> + UNWIND_HINT_REGS
>>>>>> /* Prepare handle exception */
>>>>>> or a0, s0, zero
>>>>>> @@ -200,7 +202,7 @@ ret_to_host:
>>>>>> jr ra
>>>>>> SYM_CODE_END(kvm_exc_entry)
>>>>>> -EXPORT_SYMBOL(kvm_exc_entry)
>>>>>> +EXPORT_SYMBOL_FOR_KVM(kvm_exc_entry)
>>>>>> /*
>>>>>> * int kvm_enter_guest(struct kvm_run *run, struct kvm_vcpu *vcpu)
>>>>>> @@ -215,6 +217,14 @@ SYM_FUNC_START(kvm_enter_guest)
>>>>>> /* Save host GPRs */
>>>>>> kvm_save_host_gpr a2
>>>>>> + /*
>>>>>> + * The csr_era member variable of the pt_regs structure is
>>>>>> required
>>>>>> + * for unwinding orc to perform stack traceback, so we need
>>>>>> to put
>>>>>> + * pc into csr_era member variable here.
>>>>>> + */
>>>>>> + pcaddi t0, 0
>>>>>> + st.d t0, a2, PT_ERA
>>>>> Hi, Xianglai,
>>>>>
>>>>> It should use `SYM_CODE_START` to mark the `kvm_enter_guest` rather
>>>>> than
>>>>> `SYM_FUNC_START`, since the `SYM_FUNC_START` is used to mark
>>>>> "C-likely"
>>>>> asm functionw.
>>>>
>>>> Ok, I will use SYM_CODE_START to mark kvm_enter_guest in the next
>>>> version.
>>>>
>>>>> I guess the kvm_enter_guest is something like exception
>>>>> handler becuase the last instruction is "ertn". So usually it should
>>>>> mark UNWIND_HINT_REGS where can find last frame info by "$sp".
>>>>> However, all info is store to "$a2", this mark should be
>>>>> `UNWIND_HINT sp_reg=ORC_REG_A2(???) type=UNWIND_HINT_TYPE_REGS`.
>>>>> I don't konw why save this function internal PC here by `pcaddi t0,
>>>>> 0`,
>>>>> and I think it is no meaning(, for exception handler, they save
>>>>> last PC
>>>>> by read CSR.ERA). The `kvm_enter_guest` saves registers by
>>>>> "$a2"("$sp" - PT_REGS) beyond stack ("$sp"), it is dangerous if IE
>>>>> is enable. So I wonder if there is really a stacktrace through this
>>>>> function?
>>>>>
>>>> The stack backtracking issue in switch.S is rather complex because
>>>> it involves the switching between cpu root-mode and guest-mode:
>>>> Real stack backtracking should be divided into two parts:
>>>> part 1:
>>>> [<0>] kvm_enter_guest+0x38/0x11c
>>>> [<0>] kvm_arch_vcpu_ioctl_run+0x26c/0x498 [kvm]
>>>> [<0>] kvm_vcpu_ioctl+0x200/0xcf8 [kvm]
>>>> [<0>] sys_ioctl+0x498/0xf00
>>>> [<0>] do_syscall+0x98/0x1d0
>>>> [<0>] handle_syscall+0xb8/0x158
>>>>
>>>> part 2:
>>>> [<0>] kvm_vcpu_block+0x88/0x120 [kvm]
>>>> [<0>] kvm_vcpu_halt+0x68/0x580 [kvm]
>>>> [<0>] kvm_emu_idle+0xd4/0xf0 [kvm]
>>>> [<0>] kvm_handle_gspr+0x7c/0x700 [kvm]
>>>> [<0>] kvm_handle_exit+0x160/0x270 [kvm]
>>>> [<0>] kvm_exc_entry+0x104/0x1e4
>>>>
>>>>
>>>> In "part 1", after executing kvm_enter_guest, the cpu switches from
>>>> root-mode to guest-mode.
>>>> In this case, stack backtracking is indeed very rare.
>>>>
>>>> In "part 2", the cpu switches from the guest-mode to the root-mode,
>>>> and most of the stack backtracking occurs during this phase.
>>>>
>>>> To obtain the longest call chain, we save pc in kvm_enter_guest to
>>>> pt_regs.csr_era,
>>>> and after restoring the sp of the root-mode cpu in kvm_exc_entry,
>>>> The ORC entry was re-established using "UNWIND_HINT_REGS",
>>>> and then we obtained the following stack backtrace as we wanted:
>>>>
>>>> [<0>] kvm_vcpu_block+0x88/0x120 [kvm]
>>>> [<0>] kvm_vcpu_halt+0x68/0x580 [kvm]
>>>> [<0>] kvm_emu_idle+0xd4/0xf0 [kvm]
>>>> [<0>] kvm_handle_gspr+0x7c/0x700 [kvm]
>>>> [<0>] kvm_handle_exit+0x160/0x270 [kvm]
>>>> [<0>] kvm_exc_entry+0x104/0x1e4
>>> I found this might be a coincidence—correct behavior due to the
>>> incorrect
>>> UNWIND_HINT_REGS mark and unusual stack adjustment.
>>>
>>> First, the kvm_enter_guest contains only a single branch instruction,
>>> ertn.
>>> It hardware-jump to the CSR.ERA address directly, jump into
>>> kvm_exc_entry.
>>>
>>> At this point, the stack layout looks like this:
>>> -------------------------------
>>> frame from call to `kvm_enter_guest`
>>> ------------------------------- <- $sp
>>> PT_REGS
>>> ------------------------------- <- $a2
>>>
>>> Then kvm_exc_entry adjust stack without save any register (e.g. $ra,
>>> $sp)
>>> but still marked UNWIND_HINT_REGS.
>>> After the adjustment:
>>> -------------------------------
>>> frame from call to `kvm_enter_guest`
>>> -------------------------------
>>> PT_REGS
>>> ------------------------------- <- $a2, new $sp
>>>
>>> During unwinding, when the unwinder reaches kvm_exc_entry,
>>> it meets the mark of PT_REGS and correctly recovers
>>> pc = regs.csr_era, sp = regs.sp, ra = regs.ra
>>>
>> Yes, here unwinder does work as you say.
>>
>>> a) Can we avoid "ertn" rather than `jr reg (or jirl ra, reg, 0)`
>>> instead, like call?
>> No, we need to rely on the 'ertn instruction return PIE to CRMD IE,
>> at the same time to ensure that its atomic,
>> there should be no other instruction than' ertn 'more appropriate here.
> You are right! I got it.
>>
>>> The kvm_exc_entry cannot back to kvm_enter_guest
>>> if we use "ertn", so should the kvm_enter_guest appear on the
>>> stacktrace?
>>>
>>
>> It is flexible. As I mentioned above, the cpu completes the switch
>> from host-mode to guest mode through kvm_enter_guest,
>> and then the switch from guest mode to host-mode through
>> kvm_exc_entry. When we ignore the details of the host-mode
>> and guest-mode switching in the middle, we can understand that the
>> host cpu has completed kvm_enter_guest->kvm_exc_entry.
>> From this perspective, I think it can exist in the call stack, and at
>> the same time, we have obtained the maximum call stack information.
>>
>>
>>> b) Can we adjust $sp before entering kvm_exc_entry? Then we can mark
>>> UNWIND_HINT_REGS at the beginning of kvm_exc_entry, which something
>>> like ret_from_kernel_thread_asm.
>>>
>> The following command can be used to dump the orc entries of the kernel:
>> ./tools/objtool/objtool --dump vmlinux
>>
>> You can observe that not all orc entries are generated at the
>> beginning of the function.
>> For example:
>> handle_tlb_protect
>> ftrace_stub
>> handle_reserved
>>
>> So, is it unnecessary for us to modify UNWIND_HINT_REGS in order to
>> place it at the beginning of the function.
>>
>> If you have a better solution, could you provide an example of the
>> modification?
>> I can test the feasibility of the solution.
>>
> The expression at the beginning of the function is incorrect (feeling
> sorry).
> It should be marked where have all stacktrace info.
> Thanks for all the explaining, since I'm unfamiliar with kvm, I need
> these to help my understanding.
>
> Can you try with follows, with save regs by $sp, set more precise era to
> pt_regs, and more unwind hint.
>
>
> diff --git a/arch/loongarch/kvm/switch.S b/arch/loongarch/kvm/switch.S
> index f1768b7a6194..8ed1d7b72c54 100644
> --- a/arch/loongarch/kvm/switch.S
> +++ b/arch/loongarch/kvm/switch.S
> @@ -14,13 +14,13 @@
> #define GGPR_OFFSET(x) (KVM_ARCH_GGPR + 8*x)
>
> .macro kvm_save_host_gpr base
> - .irp n,1,2,3,22,23,24,25,26,27,28,29,30,31
> + .irp n,1,2,22,23,24,25,26,27,28,29,30,31
> st.d $r\n, \base, HGPR_OFFSET(\n)
> .endr
> .endm
>
> .macro kvm_restore_host_gpr base
> - .irp n,1,2,3,22,23,24,25,26,27,28,29,30,31
> + .irp n,1,2,22,23,24,25,26,27,28,29,30,31
> ld.d $r\n, \base, HGPR_OFFSET(\n)
> .endr
> .endm
> @@ -88,6 +88,7 @@
> /* Load KVM_ARCH register */
> ld.d a2, a2, (KVM_ARCH_GGPR + 8 * REG_A2)
>
> +111:
> ertn /* Switch to guest: GSTAT.PGM = 1, ERRCTL.ISERR = 0,
> TLBRPRMD.ISTLBR = 0 */
> .endm
>
> @@ -158,9 +159,10 @@ SYM_CODE_START(kvm_exc_entry)
> csrwr t0, LOONGARCH_CSR_GTLBC
> ld.d tp, a2, KVM_ARCH_HTP
> ld.d sp, a2, KVM_ARCH_HSP
> + UNWIND_HINT_REGS
> +
> /* restore per cpu register */
> ld.d u0, a2, KVM_ARCH_HPERCPU
> - addi.d sp, sp, -PT_SIZE
>
> /* Prepare handle exception */
> or a0, s0, zero
> @@ -184,10 +186,11 @@ SYM_CODE_START(kvm_exc_entry)
> csrwr s1, KVM_VCPU_KS
> kvm_switch_to_guest
>
> + UNWIND_HINT_UNDEFINED
> ret_to_host:
> - ld.d a2, a2, KVM_ARCH_HSP
> - addi.d a2, a2, -PT_SIZE
> - kvm_restore_host_gpr a2
> + ld.d sp, a2, KVM_ARCH_HSP
> + kvm_restore_host_gpr sp
> + addi.d sp, sp, PT_SIZE
> jr ra
>
> SYM_INNER_LABEL(kvm_exc_entry_end, SYM_L_LOCAL)
> @@ -200,11 +203,15 @@ SYM_CODE_END(kvm_exc_entry)
> * a0: kvm_run* run
> * a1: kvm_vcpu* vcpu
> */
> -SYM_FUNC_START(kvm_enter_guest)
> +SYM_CODE_START(kvm_enter_guest)
> + UNWIND_HINT_UNDEFINED
> /* Allocate space in stack bottom */
> - addi.d a2, sp, -PT_SIZE
> + addi.d sp, sp, -PT_SIZE
> /* Save host GPRs */
> - kvm_save_host_gpr a2
> + kvm_save_host_gpr sp
> + la.pcrel a2, 111f
> + st.d a2, sp, PT_ERA
> + UNWIND_HINT_REGS
>
> addi.d a2, a1, KVM_VCPU_ARCH
> st.d sp, a2, KVM_ARCH_HSP
>
wow, so quick and wonderful.
Huacai,
the loongarch kernel and kvm maintainer, what is your point about this?
Regards
Bibo Mao
> Jinyang
>
Powered by blists - more mailing lists