lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAhV-H77o_4Khi-tNB7nK1yAzaD5O4RfJq+V0UyYo=1Ry_2AyQ@mail.gmail.com>
Date: Tue, 30 Dec 2025 10:13:45 +0800
From: Huacai Chen <chenhuacai@...nel.org>
To: Bibo Mao <maobibo@...ngson.cn>
Cc: Jinyang He <hejinyang@...ngson.cn>, lixianglai <lixianglai@...ngson.cn>, 
	loongarch@...ts.linux.dev, linux-kernel@...r.kernel.org, kvm@...r.kernel.org, 
	stable@...r.kernel.org, WANG Xuerui <kernel@...0n.name>, 
	Tianrui Zhao <zhaotianrui@...ngson.cn>, Charlie Jenkins <charlie@...osinc.com>, 
	Thomas Gleixner <tglx@...utronix.de>, Tiezhu Yang <yangtiezhu@...ngson.cn>
Subject: Re: [PATCH V3 2/2] LoongArch: KVM: fix "unreliable stack" issue

On Tue, Dec 30, 2025 at 9:21 AM Bibo Mao <maobibo@...ngson.cn> wrote:
>
>
>
> On 2025/12/29 下午6:41, Jinyang He wrote:
> > On 2025-12-29 18:11, lixianglai wrote:
> >
> >> Hi Jinyang:
> >>>
> >>> On 2025-12-29 11:53, lixianglai wrote:
> >>>> Hi Jinyang:
> >>>>> On 2025-12-27 09:27, Xianglai Li wrote:
> >>>>>
> >>>>>> Insert the appropriate UNWIND macro definition into the
> >>>>>> kvm_exc_entry in
> >>>>>> the assembly function to guide the generation of correct ORC table
> >>>>>> entries,
> >>>>>> thereby solving the timeout problem of loading the
> >>>>>> livepatch-sample module
> >>>>>> on a physical machine running multiple vcpus virtual machines.
> >>>>>>
> >>>>>> While solving the above problems, we have gained an additional
> >>>>>> benefit,
> >>>>>> that is, we can obtain more call stack information
> >>>>>>
> >>>>>> Stack information that can be obtained before the problem is fixed:
> >>>>>> [<0>] kvm_vcpu_block+0x88/0x120 [kvm]
> >>>>>> [<0>] kvm_vcpu_halt+0x68/0x580 [kvm]
> >>>>>> [<0>] kvm_emu_idle+0xd4/0xf0 [kvm]
> >>>>>> [<0>] kvm_handle_gspr+0x7c/0x700 [kvm]
> >>>>>> [<0>] kvm_handle_exit+0x160/0x270 [kvm]
> >>>>>> [<0>] kvm_exc_entry+0x100/0x1e0
> >>>>>>
> >>>>>> Stack information that can be obtained after the problem is fixed:
> >>>>>> [<0>] kvm_vcpu_block+0x88/0x120 [kvm]
> >>>>>> [<0>] kvm_vcpu_halt+0x68/0x580 [kvm]
> >>>>>> [<0>] kvm_emu_idle+0xd4/0xf0 [kvm]
> >>>>>> [<0>] kvm_handle_gspr+0x7c/0x700 [kvm]
> >>>>>> [<0>] kvm_handle_exit+0x160/0x270 [kvm]
> >>>>>> [<0>] kvm_exc_entry+0x104/0x1e4
> >>>>>> [<0>] kvm_enter_guest+0x38/0x11c
> >>>>>> [<0>] kvm_arch_vcpu_ioctl_run+0x26c/0x498 [kvm]
> >>>>>> [<0>] kvm_vcpu_ioctl+0x200/0xcf8 [kvm]
> >>>>>> [<0>] sys_ioctl+0x498/0xf00
> >>>>>> [<0>] do_syscall+0x98/0x1d0
> >>>>>> [<0>] handle_syscall+0xb8/0x158
> >>>>>>
> >>>>>> Cc: stable@...r.kernel.org
> >>>>>> Signed-off-by: Xianglai Li <lixianglai@...ngson.cn>
> >>>>>> ---
> >>>>>> Cc: Huacai Chen <chenhuacai@...nel.org>
> >>>>>> Cc: WANG Xuerui <kernel@...0n.name>
> >>>>>> Cc: Tianrui Zhao <zhaotianrui@...ngson.cn>
> >>>>>> Cc: Bibo Mao <maobibo@...ngson.cn>
> >>>>>> Cc: Charlie Jenkins <charlie@...osinc.com>
> >>>>>> Cc: Xianglai Li <lixianglai@...ngson.cn>
> >>>>>> Cc: Thomas Gleixner <tglx@...utronix.de>
> >>>>>> Cc: Tiezhu Yang <yangtiezhu@...ngson.cn>
> >>>>>>
> >>>>>>   arch/loongarch/kvm/switch.S | 28 +++++++++++++++++++---------
> >>>>>>   1 file changed, 19 insertions(+), 9 deletions(-)
> >>>>>>
> >>>>>> diff --git a/arch/loongarch/kvm/switch.S
> >>>>>> b/arch/loongarch/kvm/switch.S
> >>>>>> index 93845ce53651..a3ea9567dbe5 100644
> >>>>>> --- a/arch/loongarch/kvm/switch.S
> >>>>>> +++ b/arch/loongarch/kvm/switch.S
> >>>>>> @@ -10,6 +10,7 @@
> >>>>>>   #include <asm/loongarch.h>
> >>>>>>   #include <asm/regdef.h>
> >>>>>>   #include <asm/unwind_hints.h>
> >>>>>> +#include <linux/kvm_types.h>
> >>>>>>     #define HGPR_OFFSET(x)        (PT_R0 + 8*x)
> >>>>>>   #define GGPR_OFFSET(x)        (KVM_ARCH_GGPR + 8*x)
> >>>>>> @@ -110,9 +111,9 @@
> >>>>>>        * need to copy world switch code to DMW area.
> >>>>>>        */
> >>>>>>       .text
> >>>>>> +    .p2align PAGE_SHIFT
> >>>>>>       .cfi_sections    .debug_frame
> >>>>>>   SYM_CODE_START(kvm_exc_entry)
> >>>>>> -    .p2align PAGE_SHIFT
> >>>>>>       UNWIND_HINT_UNDEFINED
> >>>>>>       csrwr    a2,   KVM_TEMP_KS
> >>>>>>       csrrd    a2,   KVM_VCPU_KS
> >>>>>> @@ -170,6 +171,7 @@ SYM_CODE_START(kvm_exc_entry)
> >>>>>>       /* restore per cpu register */
> >>>>>>       ld.d    u0, a2, KVM_ARCH_HPERCPU
> >>>>>>       addi.d    sp, sp, -PT_SIZE
> >>>>>> +    UNWIND_HINT_REGS
> >>>>>>         /* Prepare handle exception */
> >>>>>>       or    a0, s0, zero
> >>>>>> @@ -200,7 +202,7 @@ ret_to_host:
> >>>>>>       jr      ra
> >>>>>>     SYM_CODE_END(kvm_exc_entry)
> >>>>>> -EXPORT_SYMBOL(kvm_exc_entry)
> >>>>>> +EXPORT_SYMBOL_FOR_KVM(kvm_exc_entry)
> >>>>>>     /*
> >>>>>>    * int kvm_enter_guest(struct kvm_run *run, struct kvm_vcpu *vcpu)
> >>>>>> @@ -215,6 +217,14 @@ SYM_FUNC_START(kvm_enter_guest)
> >>>>>>       /* Save host GPRs */
> >>>>>>       kvm_save_host_gpr a2
> >>>>>>   +    /*
> >>>>>> +     * The csr_era member variable of the pt_regs structure is
> >>>>>> required
> >>>>>> +     * for unwinding orc to perform stack traceback, so we need
> >>>>>> to put
> >>>>>> +     * pc into csr_era member variable here.
> >>>>>> +     */
> >>>>>> +    pcaddi    t0, 0
> >>>>>> +    st.d    t0, a2, PT_ERA
> >>>>> Hi, Xianglai,
> >>>>>
> >>>>> It should use `SYM_CODE_START` to mark the `kvm_enter_guest` rather
> >>>>> than
> >>>>> `SYM_FUNC_START`, since the `SYM_FUNC_START` is used to mark
> >>>>> "C-likely"
> >>>>> asm functionw.
> >>>>
> >>>> Ok, I will use SYM_CODE_START to mark kvm_enter_guest in the next
> >>>> version.
> >>>>
> >>>>> I guess the kvm_enter_guest is something like exception
> >>>>> handler becuase the last instruction is "ertn". So usually it should
> >>>>> mark UNWIND_HINT_REGS where can find last frame info by "$sp".
> >>>>> However, all info is store to "$a2", this mark should be
> >>>>>   `UNWIND_HINT sp_reg=ORC_REG_A2(???) type=UNWIND_HINT_TYPE_REGS`.
> >>>>> I don't konw why save this function internal PC here by `pcaddi t0,
> >>>>> 0`,
> >>>>> and I think it is no meaning(, for exception handler, they save
> >>>>> last PC
> >>>>> by read CSR.ERA). The `kvm_enter_guest` saves registers by
> >>>>> "$a2"("$sp" - PT_REGS) beyond stack ("$sp"), it is dangerous if IE
> >>>>> is enable. So I wonder if there is really a stacktrace through this
> >>>>> function?
> >>>>>
> >>>> The stack backtracking issue in switch.S is rather complex because
> >>>> it involves the switching between cpu root-mode and guest-mode:
> >>>> Real stack backtracking should be divided into two parts:
> >>>> part 1:
> >>>>     [<0>] kvm_enter_guest+0x38/0x11c
> >>>>     [<0>] kvm_arch_vcpu_ioctl_run+0x26c/0x498 [kvm]
> >>>>     [<0>] kvm_vcpu_ioctl+0x200/0xcf8 [kvm]
> >>>>     [<0>] sys_ioctl+0x498/0xf00
> >>>>     [<0>] do_syscall+0x98/0x1d0
> >>>>     [<0>] handle_syscall+0xb8/0x158
> >>>>
> >>>> part 2:
> >>>>     [<0>] kvm_vcpu_block+0x88/0x120 [kvm]
> >>>>     [<0>] kvm_vcpu_halt+0x68/0x580 [kvm]
> >>>>     [<0>] kvm_emu_idle+0xd4/0xf0 [kvm]
> >>>>     [<0>] kvm_handle_gspr+0x7c/0x700 [kvm]
> >>>>     [<0>] kvm_handle_exit+0x160/0x270 [kvm]
> >>>>     [<0>] kvm_exc_entry+0x104/0x1e4
> >>>>
> >>>>
> >>>> In "part 1", after executing kvm_enter_guest, the cpu switches from
> >>>> root-mode to guest-mode.
> >>>> In this case, stack backtracking is indeed very rare.
> >>>>
> >>>> In "part 2", the cpu switches from the guest-mode to the root-mode,
> >>>> and most of the stack backtracking occurs during this phase.
> >>>>
> >>>> To obtain the longest call chain, we save pc in kvm_enter_guest to
> >>>> pt_regs.csr_era,
> >>>> and after restoring the sp of the root-mode cpu in kvm_exc_entry,
> >>>> The ORC entry was re-established using "UNWIND_HINT_REGS",
> >>>>  and then we obtained the following stack backtrace as we wanted:
> >>>>
> >>>>     [<0>] kvm_vcpu_block+0x88/0x120 [kvm]
> >>>>     [<0>] kvm_vcpu_halt+0x68/0x580 [kvm]
> >>>>     [<0>] kvm_emu_idle+0xd4/0xf0 [kvm]
> >>>>     [<0>] kvm_handle_gspr+0x7c/0x700 [kvm]
> >>>>     [<0>] kvm_handle_exit+0x160/0x270 [kvm]
> >>>>     [<0>] kvm_exc_entry+0x104/0x1e4
> >>> I found this might be a coincidence—correct behavior due to the
> >>> incorrect
> >>> UNWIND_HINT_REGS mark and unusual stack adjustment.
> >>>
> >>> First, the kvm_enter_guest contains only a single branch instruction,
> >>> ertn.
> >>> It hardware-jump to the CSR.ERA address directly, jump into
> >>> kvm_exc_entry.
> >>>
> >>> At this point, the stack layout looks like this:
> >>> -------------------------------
> >>>   frame from call to `kvm_enter_guest`
> >>> -------------------------------  <- $sp
> >>>   PT_REGS
> >>> -------------------------------  <- $a2
> >>>
> >>> Then kvm_exc_entry adjust stack without save any register (e.g. $ra,
> >>> $sp)
> >>> but still marked UNWIND_HINT_REGS.
> >>> After the adjustment:
> >>> -------------------------------
> >>>   frame from call to `kvm_enter_guest`
> >>> -------------------------------
> >>>   PT_REGS
> >>> -------------------------------  <- $a2, new $sp
> >>>
> >>> During unwinding, when the unwinder reaches kvm_exc_entry,
> >>> it meets the mark of PT_REGS and correctly recovers
> >>>  pc = regs.csr_era, sp = regs.sp, ra = regs.ra
> >>>
> >> Yes, here unwinder does work as you say.
> >>
> >>> a) Can we avoid "ertn" rather than `jr reg (or jirl ra, reg, 0)`
> >>> instead, like call?
> >> No,  we need to rely on the 'ertn instruction return PIE to CRMD IE,
> >> at the same time to ensure that its atomic,
> >> there should be no other instruction than' ertn 'more appropriate here.
> > You are right! I got it.
> >>
> >>> The kvm_exc_entry cannot back to kvm_enter_guest
> >>> if we use "ertn", so should the kvm_enter_guest appear on the
> >>> stacktrace?
> >>>
> >>
> >> It is flexible. As I mentioned above, the cpu completes the switch
> >> from host-mode to guest mode through kvm_enter_guest,
> >> and then the switch from guest mode to host-mode through
> >> kvm_exc_entry. When we ignore the details of the host-mode
> >> and guest-mode switching in the middle, we can understand that the
> >> host cpu has completed kvm_enter_guest->kvm_exc_entry.
> >> From this perspective, I think it can exist in the call stack, and at
> >> the same time, we have obtained the maximum call stack information.
> >>
> >>
> >>> b) Can we adjust $sp before entering kvm_exc_entry? Then we can mark
> >>> UNWIND_HINT_REGS at the beginning of kvm_exc_entry, which something
> >>> like ret_from_kernel_thread_asm.
> >>>
> >> The following command can be used to dump the orc entries of the kernel:
> >> ./tools/objtool/objtool --dump vmlinux
> >>
> >> You can observe that not all orc entries are generated at the
> >> beginning of the function.
> >> For example:
> >> handle_tlb_protect
> >> ftrace_stub
> >> handle_reserved
> >>
> >> So, is it unnecessary for us to modify UNWIND_HINT_REGS in order to
> >> place it at the beginning of the function.
> >>
> >> If you have a better solution, could you provide an example of the
> >> modification?
> >> I can test the feasibility of the solution.
> >>
> > The expression at the beginning of the function is incorrect (feeling
> > sorry).
> > It should be marked where have all stacktrace info.
> > Thanks for all the explaining, since I'm unfamiliar with kvm, I need
> > these to help my understanding.
> >
> > Can you try with follows, with save regs by $sp, set more precise era to
> > pt_regs, and more unwind hint.
> >
> >
> > diff --git a/arch/loongarch/kvm/switch.S b/arch/loongarch/kvm/switch.S
> > index f1768b7a6194..8ed1d7b72c54 100644
> > --- a/arch/loongarch/kvm/switch.S
> > +++ b/arch/loongarch/kvm/switch.S
> > @@ -14,13 +14,13 @@
> >   #define GGPR_OFFSET(x)        (KVM_ARCH_GGPR + 8*x)
> >
> >   .macro kvm_save_host_gpr base
> > -    .irp n,1,2,3,22,23,24,25,26,27,28,29,30,31
> > +    .irp n,1,2,22,23,24,25,26,27,28,29,30,31
> >       st.d    $r\n, \base, HGPR_OFFSET(\n)
> >       .endr
> >   .endm
> >
> >   .macro kvm_restore_host_gpr base
> > -    .irp n,1,2,3,22,23,24,25,26,27,28,29,30,31
> > +    .irp n,1,2,22,23,24,25,26,27,28,29,30,31
> >       ld.d    $r\n, \base, HGPR_OFFSET(\n)
> >       .endr
> >   .endm
> > @@ -88,6 +88,7 @@
> >       /* Load KVM_ARCH register */
> >       ld.d    a2, a2,    (KVM_ARCH_GGPR + 8 * REG_A2)
> >
> > +111:
> >       ertn /* Switch to guest: GSTAT.PGM = 1, ERRCTL.ISERR = 0,
> > TLBRPRMD.ISTLBR = 0 */
> >   .endm
> >
> > @@ -158,9 +159,10 @@ SYM_CODE_START(kvm_exc_entry)
> >       csrwr    t0, LOONGARCH_CSR_GTLBC
> >       ld.d    tp, a2, KVM_ARCH_HTP
> >       ld.d    sp, a2, KVM_ARCH_HSP
> > +    UNWIND_HINT_REGS
> > +
> >       /* restore per cpu register */
> >       ld.d    u0, a2, KVM_ARCH_HPERCPU
> > -    addi.d    sp, sp, -PT_SIZE
> >
> >       /* Prepare handle exception */
> >       or    a0, s0, zero
> > @@ -184,10 +186,11 @@ SYM_CODE_START(kvm_exc_entry)
> >       csrwr    s1, KVM_VCPU_KS
> >       kvm_switch_to_guest
> >
> > +    UNWIND_HINT_UNDEFINED
> >   ret_to_host:
> > -    ld.d    a2, a2, KVM_ARCH_HSP
> > -    addi.d  a2, a2, -PT_SIZE
> > -    kvm_restore_host_gpr    a2
> > +    ld.d    sp, a2, KVM_ARCH_HSP
> > +    kvm_restore_host_gpr    sp
> > +    addi.d    sp, sp, PT_SIZE
> >       jr      ra
> >
> >   SYM_INNER_LABEL(kvm_exc_entry_end, SYM_L_LOCAL)
> > @@ -200,11 +203,15 @@ SYM_CODE_END(kvm_exc_entry)
> >    *  a0: kvm_run* run
> >    *  a1: kvm_vcpu* vcpu
> >    */
> > -SYM_FUNC_START(kvm_enter_guest)
> > +SYM_CODE_START(kvm_enter_guest)
> > +    UNWIND_HINT_UNDEFINED
> >       /* Allocate space in stack bottom */
> > -    addi.d    a2, sp, -PT_SIZE
> > +    addi.d    sp, sp, -PT_SIZE
> >       /* Save host GPRs */
> > -    kvm_save_host_gpr a2
> > +    kvm_save_host_gpr sp
> > +    la.pcrel a2, 111f
> > +    st.d     a2, sp, PT_ERA
> > +    UNWIND_HINT_REGS
> >
> >       addi.d    a2, a1, KVM_VCPU_ARCH
> >       st.d    sp, a2, KVM_ARCH_HSP
> >
> wow, so quick and wonderful.
>
> Huacai,
>   the loongarch kernel and kvm maintainer, what is your point about this?
Xianglai's patch really has some problems. But Jinyang's solution is a
little big, can we minimize the changes? For example, don't touch
kvm_exc_entry, only modify kvm_enter_guest. In my opinion the current
problem has no relationship with kvm_exc_entry (but I'm not sure).

Huacai

>
> Regards
> Bibo Mao
> > Jinyang
> >
>
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ