lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <efa4ef2d-aef7-0f64-07bc-55d0c4d1d6d2@loongson.cn>
Date: Tue, 30 Dec 2025 10:24:07 +0800
From: Bibo Mao <maobibo@...ngson.cn>
To: Jinyang He <hejinyang@...ngson.cn>, lixianglai <lixianglai@...ngson.cn>
Cc: loongarch@...ts.linux.dev, linux-kernel@...r.kernel.org,
 kvm@...r.kernel.org, stable@...r.kernel.org,
 Huacai Chen <chenhuacai@...nel.org>, WANG Xuerui <kernel@...0n.name>,
 Tianrui Zhao <zhaotianrui@...ngson.cn>,
 Charlie Jenkins <charlie@...osinc.com>, Thomas Gleixner
 <tglx@...utronix.de>, Tiezhu Yang <yangtiezhu@...ngson.cn>
Subject: Re: [PATCH V3 2/2] LoongArch: KVM: fix "unreliable stack" issue



On 2025/12/29 下午6:41, Jinyang He wrote:
> On 2025-12-29 18:11, lixianglai wrote:
> 
>> Hi Jinyang:
>>>
>>> On 2025-12-29 11:53, lixianglai wrote:
>>>> Hi Jinyang:
>>>>> On 2025-12-27 09:27, Xianglai Li wrote:
>>>>>
>>>>>> Insert the appropriate UNWIND macro definition into the 
>>>>>> kvm_exc_entry in
>>>>>> the assembly function to guide the generation of correct ORC table 
>>>>>> entries,
>>>>>> thereby solving the timeout problem of loading the 
>>>>>> livepatch-sample module
>>>>>> on a physical machine running multiple vcpus virtual machines.
>>>>>>
>>>>>> While solving the above problems, we have gained an additional 
>>>>>> benefit,
>>>>>> that is, we can obtain more call stack information
>>>>>>
>>>>>> Stack information that can be obtained before the problem is fixed:
>>>>>> [<0>] kvm_vcpu_block+0x88/0x120 [kvm]
>>>>>> [<0>] kvm_vcpu_halt+0x68/0x580 [kvm]
>>>>>> [<0>] kvm_emu_idle+0xd4/0xf0 [kvm]
>>>>>> [<0>] kvm_handle_gspr+0x7c/0x700 [kvm]
>>>>>> [<0>] kvm_handle_exit+0x160/0x270 [kvm]
>>>>>> [<0>] kvm_exc_entry+0x100/0x1e0
>>>>>>
>>>>>> Stack information that can be obtained after the problem is fixed:
>>>>>> [<0>] kvm_vcpu_block+0x88/0x120 [kvm]
>>>>>> [<0>] kvm_vcpu_halt+0x68/0x580 [kvm]
>>>>>> [<0>] kvm_emu_idle+0xd4/0xf0 [kvm]
>>>>>> [<0>] kvm_handle_gspr+0x7c/0x700 [kvm]
>>>>>> [<0>] kvm_handle_exit+0x160/0x270 [kvm]
>>>>>> [<0>] kvm_exc_entry+0x104/0x1e4
>>>>>> [<0>] kvm_enter_guest+0x38/0x11c
>>>>>> [<0>] kvm_arch_vcpu_ioctl_run+0x26c/0x498 [kvm]
>>>>>> [<0>] kvm_vcpu_ioctl+0x200/0xcf8 [kvm]
>>>>>> [<0>] sys_ioctl+0x498/0xf00
>>>>>> [<0>] do_syscall+0x98/0x1d0
>>>>>> [<0>] handle_syscall+0xb8/0x158
>>>>>>
>>>>>> Cc: stable@...r.kernel.org
>>>>>> Signed-off-by: Xianglai Li <lixianglai@...ngson.cn>
>>>>>> ---
>>>>>> Cc: Huacai Chen <chenhuacai@...nel.org>
>>>>>> Cc: WANG Xuerui <kernel@...0n.name>
>>>>>> Cc: Tianrui Zhao <zhaotianrui@...ngson.cn>
>>>>>> Cc: Bibo Mao <maobibo@...ngson.cn>
>>>>>> Cc: Charlie Jenkins <charlie@...osinc.com>
>>>>>> Cc: Xianglai Li <lixianglai@...ngson.cn>
>>>>>> Cc: Thomas Gleixner <tglx@...utronix.de>
>>>>>> Cc: Tiezhu Yang <yangtiezhu@...ngson.cn>
>>>>>>
>>>>>>   arch/loongarch/kvm/switch.S | 28 +++++++++++++++++++---------
>>>>>>   1 file changed, 19 insertions(+), 9 deletions(-)
>>>>>>
>>>>>> diff --git a/arch/loongarch/kvm/switch.S 
>>>>>> b/arch/loongarch/kvm/switch.S
>>>>>> index 93845ce53651..a3ea9567dbe5 100644
>>>>>> --- a/arch/loongarch/kvm/switch.S
>>>>>> +++ b/arch/loongarch/kvm/switch.S
>>>>>> @@ -10,6 +10,7 @@
>>>>>>   #include <asm/loongarch.h>
>>>>>>   #include <asm/regdef.h>
>>>>>>   #include <asm/unwind_hints.h>
>>>>>> +#include <linux/kvm_types.h>
>>>>>>     #define HGPR_OFFSET(x)        (PT_R0 + 8*x)
>>>>>>   #define GGPR_OFFSET(x)        (KVM_ARCH_GGPR + 8*x)
>>>>>> @@ -110,9 +111,9 @@
>>>>>>        * need to copy world switch code to DMW area.
>>>>>>        */
>>>>>>       .text
>>>>>> +    .p2align PAGE_SHIFT
>>>>>>       .cfi_sections    .debug_frame
>>>>>>   SYM_CODE_START(kvm_exc_entry)
>>>>>> -    .p2align PAGE_SHIFT
>>>>>>       UNWIND_HINT_UNDEFINED
>>>>>>       csrwr    a2,   KVM_TEMP_KS
>>>>>>       csrrd    a2,   KVM_VCPU_KS
>>>>>> @@ -170,6 +171,7 @@ SYM_CODE_START(kvm_exc_entry)
>>>>>>       /* restore per cpu register */
>>>>>>       ld.d    u0, a2, KVM_ARCH_HPERCPU
>>>>>>       addi.d    sp, sp, -PT_SIZE
>>>>>> +    UNWIND_HINT_REGS
>>>>>>         /* Prepare handle exception */
>>>>>>       or    a0, s0, zero
>>>>>> @@ -200,7 +202,7 @@ ret_to_host:
>>>>>>       jr      ra
>>>>>>     SYM_CODE_END(kvm_exc_entry)
>>>>>> -EXPORT_SYMBOL(kvm_exc_entry)
>>>>>> +EXPORT_SYMBOL_FOR_KVM(kvm_exc_entry)
>>>>>>     /*
>>>>>>    * int kvm_enter_guest(struct kvm_run *run, struct kvm_vcpu *vcpu)
>>>>>> @@ -215,6 +217,14 @@ SYM_FUNC_START(kvm_enter_guest)
>>>>>>       /* Save host GPRs */
>>>>>>       kvm_save_host_gpr a2
>>>>>>   +    /*
>>>>>> +     * The csr_era member variable of the pt_regs structure is 
>>>>>> required
>>>>>> +     * for unwinding orc to perform stack traceback, so we need 
>>>>>> to put
>>>>>> +     * pc into csr_era member variable here.
>>>>>> +     */
>>>>>> +    pcaddi    t0, 0
>>>>>> +    st.d    t0, a2, PT_ERA
>>>>> Hi, Xianglai,
>>>>>
>>>>> It should use `SYM_CODE_START` to mark the `kvm_enter_guest` rather 
>>>>> than
>>>>> `SYM_FUNC_START`, since the `SYM_FUNC_START` is used to mark 
>>>>> "C-likely"
>>>>> asm functionw. 
>>>>
>>>> Ok, I will use SYM_CODE_START to mark kvm_enter_guest in the next 
>>>> version.
>>>>
>>>>> I guess the kvm_enter_guest is something like exception
>>>>> handler becuase the last instruction is "ertn". So usually it should
>>>>> mark UNWIND_HINT_REGS where can find last frame info by "$sp".
>>>>> However, all info is store to "$a2", this mark should be
>>>>>   `UNWIND_HINT sp_reg=ORC_REG_A2(???) type=UNWIND_HINT_TYPE_REGS`.
>>>>> I don't konw why save this function internal PC here by `pcaddi t0, 
>>>>> 0`,
>>>>> and I think it is no meaning(, for exception handler, they save 
>>>>> last PC
>>>>> by read CSR.ERA). The `kvm_enter_guest` saves registers by
>>>>> "$a2"("$sp" - PT_REGS) beyond stack ("$sp"), it is dangerous if IE
>>>>> is enable. So I wonder if there is really a stacktrace through this 
>>>>> function?
>>>>>
>>>> The stack backtracking issue in switch.S is rather complex because 
>>>> it involves the switching between cpu root-mode and guest-mode:
>>>> Real stack backtracking should be divided into two parts:
>>>> part 1:
>>>>     [<0>] kvm_enter_guest+0x38/0x11c
>>>>     [<0>] kvm_arch_vcpu_ioctl_run+0x26c/0x498 [kvm]
>>>>     [<0>] kvm_vcpu_ioctl+0x200/0xcf8 [kvm]
>>>>     [<0>] sys_ioctl+0x498/0xf00
>>>>     [<0>] do_syscall+0x98/0x1d0
>>>>     [<0>] handle_syscall+0xb8/0x158
>>>>
>>>> part 2:
>>>>     [<0>] kvm_vcpu_block+0x88/0x120 [kvm]
>>>>     [<0>] kvm_vcpu_halt+0x68/0x580 [kvm]
>>>>     [<0>] kvm_emu_idle+0xd4/0xf0 [kvm]
>>>>     [<0>] kvm_handle_gspr+0x7c/0x700 [kvm]
>>>>     [<0>] kvm_handle_exit+0x160/0x270 [kvm]
>>>>     [<0>] kvm_exc_entry+0x104/0x1e4
>>>>
>>>>
>>>> In "part 1", after executing kvm_enter_guest, the cpu switches from 
>>>> root-mode to guest-mode.
>>>> In this case, stack backtracking is indeed very rare.
>>>>
>>>> In "part 2", the cpu switches from the guest-mode to the root-mode,
>>>> and most of the stack backtracking occurs during this phase.
>>>>
>>>> To obtain the longest call chain, we save pc in kvm_enter_guest to 
>>>> pt_regs.csr_era,
>>>> and after restoring the sp of the root-mode cpu in kvm_exc_entry,
>>>> The ORC entry was re-established using "UNWIND_HINT_REGS",
>>>>  and then we obtained the following stack backtrace as we wanted:
>>>>
>>>>     [<0>] kvm_vcpu_block+0x88/0x120 [kvm]
>>>>     [<0>] kvm_vcpu_halt+0x68/0x580 [kvm]
>>>>     [<0>] kvm_emu_idle+0xd4/0xf0 [kvm]
>>>>     [<0>] kvm_handle_gspr+0x7c/0x700 [kvm]
>>>>     [<0>] kvm_handle_exit+0x160/0x270 [kvm]
>>>>     [<0>] kvm_exc_entry+0x104/0x1e4
>>> I found this might be a coincidence—correct behavior due to the 
>>> incorrect
>>> UNWIND_HINT_REGS mark and unusual stack adjustment.
>>>
>>> First, the kvm_enter_guest contains only a single branch instruction, 
>>> ertn.
>>> It hardware-jump to the CSR.ERA address directly, jump into 
>>> kvm_exc_entry.
>>>
>>> At this point, the stack layout looks like this:
>>> -------------------------------
>>>   frame from call to `kvm_enter_guest`
>>> -------------------------------  <- $sp
>>>   PT_REGS
>>> -------------------------------  <- $a2
>>>
>>> Then kvm_exc_entry adjust stack without save any register (e.g. $ra, 
>>> $sp)
>>> but still marked UNWIND_HINT_REGS.
>>> After the adjustment:
>>> -------------------------------
>>>   frame from call to `kvm_enter_guest`
>>> -------------------------------
>>>   PT_REGS
>>> -------------------------------  <- $a2, new $sp
>>>
>>> During unwinding, when the unwinder reaches kvm_exc_entry,
>>> it meets the mark of PT_REGS and correctly recovers
>>>  pc = regs.csr_era, sp = regs.sp, ra = regs.ra
>>>
>> Yes, here unwinder does work as you say.
>>
>>> a) Can we avoid "ertn" rather than `jr reg (or jirl ra, reg, 0)`
>>> instead, like call?
>> No,  we need to rely on the 'ertn instruction return PIE to CRMD IE,
>> at the same time to ensure that its atomic,
>> there should be no other instruction than' ertn 'more appropriate here.
> You are right! I got it.
>>
>>> The kvm_exc_entry cannot back to kvm_enter_guest
>>> if we use "ertn", so should the kvm_enter_guest appear on the 
>>> stacktrace?
>>>
>>
>> It is flexible. As I mentioned above, the cpu completes the switch 
>> from host-mode to guest mode through kvm_enter_guest,
>> and then the switch from guest mode to host-mode through 
>> kvm_exc_entry. When we ignore the details of the host-mode
>> and guest-mode switching in the middle, we can understand that the 
>> host cpu has completed kvm_enter_guest->kvm_exc_entry.
>> From this perspective, I think it can exist in the call stack, and at 
>> the same time, we have obtained the maximum call stack information.
>>
>>
>>> b) Can we adjust $sp before entering kvm_exc_entry? Then we can mark
>>> UNWIND_HINT_REGS at the beginning of kvm_exc_entry, which something
>>> like ret_from_kernel_thread_asm.
>>>
>> The following command can be used to dump the orc entries of the kernel:
>> ./tools/objtool/objtool --dump vmlinux
>>
>> You can observe that not all orc entries are generated at the 
>> beginning of the function.
>> For example:
>> handle_tlb_protect
>> ftrace_stub
>> handle_reserved
>>
>> So, is it unnecessary for us to modify UNWIND_HINT_REGS in order to 
>> place it at the beginning of the function.
>>
>> If you have a better solution, could you provide an example of the 
>> modification?
>> I can test the feasibility of the solution.
>>
> The expression at the beginning of the function is incorrect (feeling 
> sorry).
> It should be marked where have all stacktrace info.
> Thanks for all the explaining, since I'm unfamiliar with kvm, I need 
> these to help my understanding.
> 
> Can you try with follows, with save regs by $sp, set more precise era to 
> pt_regs, and more unwind hint.
> 
> 
> diff --git a/arch/loongarch/kvm/switch.S b/arch/loongarch/kvm/switch.S
> index f1768b7a6194..8ed1d7b72c54 100644
> --- a/arch/loongarch/kvm/switch.S
> +++ b/arch/loongarch/kvm/switch.S
> @@ -14,13 +14,13 @@
>   #define GGPR_OFFSET(x)        (KVM_ARCH_GGPR + 8*x)
> 
>   .macro kvm_save_host_gpr base
> -    .irp n,1,2,3,22,23,24,25,26,27,28,29,30,31
> +    .irp n,1,2,22,23,24,25,26,27,28,29,30,31
>       st.d    $r\n, \base, HGPR_OFFSET(\n)
>       .endr
>   .endm
> 
>   .macro kvm_restore_host_gpr base
> -    .irp n,1,2,3,22,23,24,25,26,27,28,29,30,31
> +    .irp n,1,2,22,23,24,25,26,27,28,29,30,31
>       ld.d    $r\n, \base, HGPR_OFFSET(\n)
>       .endr
>   .endm
> @@ -88,6 +88,7 @@
>       /* Load KVM_ARCH register */
>       ld.d    a2, a2,    (KVM_ARCH_GGPR + 8 * REG_A2)
> 
> +111:
>       ertn /* Switch to guest: GSTAT.PGM = 1, ERRCTL.ISERR = 0, 
> TLBRPRMD.ISTLBR = 0 */
>   .endm
> 
> @@ -158,9 +159,10 @@ SYM_CODE_START(kvm_exc_entry)
>       csrwr    t0, LOONGARCH_CSR_GTLBC
>       ld.d    tp, a2, KVM_ARCH_HTP
>       ld.d    sp, a2, KVM_ARCH_HSP
> +    UNWIND_HINT_REGS
> +
>       /* restore per cpu register */
>       ld.d    u0, a2, KVM_ARCH_HPERCPU
> -    addi.d    sp, sp, -PT_SIZE
> 
>       /* Prepare handle exception */
>       or    a0, s0, zero
> @@ -184,10 +186,11 @@ SYM_CODE_START(kvm_exc_entry)
>       csrwr    s1, KVM_VCPU_KS
>       kvm_switch_to_guest
> 
> +    UNWIND_HINT_UNDEFINED
>   ret_to_host:
> -    ld.d    a2, a2, KVM_ARCH_HSP
> -    addi.d  a2, a2, -PT_SIZE
> -    kvm_restore_host_gpr    a2
> +    ld.d    sp, a2, KVM_ARCH_HSP
> +    kvm_restore_host_gpr    sp
> +    addi.d    sp, sp, PT_SIZE
>       jr      ra
> 
>   SYM_INNER_LABEL(kvm_exc_entry_end, SYM_L_LOCAL)
> @@ -200,11 +203,15 @@ SYM_CODE_END(kvm_exc_entry)
>    *  a0: kvm_run* run
>    *  a1: kvm_vcpu* vcpu
>    */
> -SYM_FUNC_START(kvm_enter_guest)
> +SYM_CODE_START(kvm_enter_guest)
> +    UNWIND_HINT_UNDEFINED
>       /* Allocate space in stack bottom */
> -    addi.d    a2, sp, -PT_SIZE
> +    addi.d    sp, sp, -PT_SIZE
>       /* Save host GPRs */
> -    kvm_save_host_gpr a2
> +    kvm_save_host_gpr sp
> +    la.pcrel a2, 111f
> +    st.d     a2, sp, PT_ERA
> +    UNWIND_HINT_REGS
> 
why the label 111f is more accurate?  Supposing there is hw breakpoint 
here and backtrace is called, what is the call trace stack then? obvious 
label 111f is not executed instead.

UNWIND_HINT_REGS is used for nested kernel stack, is that right?
With nested interrupt and exception handlers on LoongArch kernel, is 
UNWIND_HINT_REGS used?

SYM_CODE_START(ret_from_fork_asm)
         UNWIND_HINT_REGS
         move            a1, sp
         bl              ret_from_fork
         STACKLEAK_ERASE
         RESTORE_STATIC
         RESTORE_SOME
         RESTORE_SP_AND_RET
SYM_CODE_END(ret_from_fork_asm)
With this piece of code, what is contents of pt_regs? In generic it is 
called from sys_clone, era is user PC address, is that right? If so,
what is detailed usage in the beginning of ret_from_fork_asm?

Regards
Bibo Mao

>       addi.d    a2, a1, KVM_VCPU_ARCH
>       st.d    sp, a2, KVM_ARCH_HSP
> 
> Jinyang
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ