lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <dec5cb06-6858-20f2-facb-d5e7f44f5d16@loongson.cn>
Date: Mon, 29 Dec 2025 15:10:40 +0800
From: Jinyang He <hejinyang@...ngson.cn>
To: lixianglai <lixianglai@...ngson.cn>
Cc: loongarch@...ts.linux.dev, linux-kernel@...r.kernel.org,
 kvm@...r.kernel.org, stable@...r.kernel.org,
 Huacai Chen <chenhuacai@...nel.org>, WANG Xuerui <kernel@...0n.name>,
 Tianrui Zhao <zhaotianrui@...ngson.cn>, Bibo Mao <maobibo@...ngson.cn>,
 Charlie Jenkins <charlie@...osinc.com>, Thomas Gleixner
 <tglx@...utronix.de>, Tiezhu Yang <yangtiezhu@...ngson.cn>
Subject: Re: [PATCH V3 2/2] LoongArch: KVM: fix "unreliable stack" issue


On 2025-12-29 11:53, lixianglai wrote:
> Hi Jinyang:
>> On 2025-12-27 09:27, Xianglai Li wrote:
>>
>>> Insert the appropriate UNWIND macro definition into the 
>>> kvm_exc_entry in
>>> the assembly function to guide the generation of correct ORC table 
>>> entries,
>>> thereby solving the timeout problem of loading the livepatch-sample 
>>> module
>>> on a physical machine running multiple vcpus virtual machines.
>>>
>>> While solving the above problems, we have gained an additional benefit,
>>> that is, we can obtain more call stack information
>>>
>>> Stack information that can be obtained before the problem is fixed:
>>> [<0>] kvm_vcpu_block+0x88/0x120 [kvm]
>>> [<0>] kvm_vcpu_halt+0x68/0x580 [kvm]
>>> [<0>] kvm_emu_idle+0xd4/0xf0 [kvm]
>>> [<0>] kvm_handle_gspr+0x7c/0x700 [kvm]
>>> [<0>] kvm_handle_exit+0x160/0x270 [kvm]
>>> [<0>] kvm_exc_entry+0x100/0x1e0
>>>
>>> Stack information that can be obtained after the problem is fixed:
>>> [<0>] kvm_vcpu_block+0x88/0x120 [kvm]
>>> [<0>] kvm_vcpu_halt+0x68/0x580 [kvm]
>>> [<0>] kvm_emu_idle+0xd4/0xf0 [kvm]
>>> [<0>] kvm_handle_gspr+0x7c/0x700 [kvm]
>>> [<0>] kvm_handle_exit+0x160/0x270 [kvm]
>>> [<0>] kvm_exc_entry+0x104/0x1e4
>>> [<0>] kvm_enter_guest+0x38/0x11c
>>> [<0>] kvm_arch_vcpu_ioctl_run+0x26c/0x498 [kvm]
>>> [<0>] kvm_vcpu_ioctl+0x200/0xcf8 [kvm]
>>> [<0>] sys_ioctl+0x498/0xf00
>>> [<0>] do_syscall+0x98/0x1d0
>>> [<0>] handle_syscall+0xb8/0x158
>>>
>>> Cc: stable@...r.kernel.org
>>> Signed-off-by: Xianglai Li <lixianglai@...ngson.cn>
>>> ---
>>> Cc: Huacai Chen <chenhuacai@...nel.org>
>>> Cc: WANG Xuerui <kernel@...0n.name>
>>> Cc: Tianrui Zhao <zhaotianrui@...ngson.cn>
>>> Cc: Bibo Mao <maobibo@...ngson.cn>
>>> Cc: Charlie Jenkins <charlie@...osinc.com>
>>> Cc: Xianglai Li <lixianglai@...ngson.cn>
>>> Cc: Thomas Gleixner <tglx@...utronix.de>
>>> Cc: Tiezhu Yang <yangtiezhu@...ngson.cn>
>>>
>>>   arch/loongarch/kvm/switch.S | 28 +++++++++++++++++++---------
>>>   1 file changed, 19 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/arch/loongarch/kvm/switch.S b/arch/loongarch/kvm/switch.S
>>> index 93845ce53651..a3ea9567dbe5 100644
>>> --- a/arch/loongarch/kvm/switch.S
>>> +++ b/arch/loongarch/kvm/switch.S
>>> @@ -10,6 +10,7 @@
>>>   #include <asm/loongarch.h>
>>>   #include <asm/regdef.h>
>>>   #include <asm/unwind_hints.h>
>>> +#include <linux/kvm_types.h>
>>>     #define HGPR_OFFSET(x)        (PT_R0 + 8*x)
>>>   #define GGPR_OFFSET(x)        (KVM_ARCH_GGPR + 8*x)
>>> @@ -110,9 +111,9 @@
>>>        * need to copy world switch code to DMW area.
>>>        */
>>>       .text
>>> +    .p2align PAGE_SHIFT
>>>       .cfi_sections    .debug_frame
>>>   SYM_CODE_START(kvm_exc_entry)
>>> -    .p2align PAGE_SHIFT
>>>       UNWIND_HINT_UNDEFINED
>>>       csrwr    a2,   KVM_TEMP_KS
>>>       csrrd    a2,   KVM_VCPU_KS
>>> @@ -170,6 +171,7 @@ SYM_CODE_START(kvm_exc_entry)
>>>       /* restore per cpu register */
>>>       ld.d    u0, a2, KVM_ARCH_HPERCPU
>>>       addi.d    sp, sp, -PT_SIZE
>>> +    UNWIND_HINT_REGS
>>>         /* Prepare handle exception */
>>>       or    a0, s0, zero
>>> @@ -200,7 +202,7 @@ ret_to_host:
>>>       jr      ra
>>>     SYM_CODE_END(kvm_exc_entry)
>>> -EXPORT_SYMBOL(kvm_exc_entry)
>>> +EXPORT_SYMBOL_FOR_KVM(kvm_exc_entry)
>>>     /*
>>>    * int kvm_enter_guest(struct kvm_run *run, struct kvm_vcpu *vcpu)
>>> @@ -215,6 +217,14 @@ SYM_FUNC_START(kvm_enter_guest)
>>>       /* Save host GPRs */
>>>       kvm_save_host_gpr a2
>>>   +    /*
>>> +     * The csr_era member variable of the pt_regs structure is 
>>> required
>>> +     * for unwinding orc to perform stack traceback, so we need to put
>>> +     * pc into csr_era member variable here.
>>> +     */
>>> +    pcaddi    t0, 0
>>> +    st.d    t0, a2, PT_ERA
>> Hi, Xianglai,
>>
>> It should use `SYM_CODE_START` to mark the `kvm_enter_guest` rather than
>> `SYM_FUNC_START`, since the `SYM_FUNC_START` is used to mark "C-likely"
>> asm functionw. 
>
> Ok, I will use SYM_CODE_START to mark kvm_enter_guest in the next 
> version.
>
>> I guess the kvm_enter_guest is something like exception
>> handler becuase the last instruction is "ertn". So usually it should
>> mark UNWIND_HINT_REGS where can find last frame info by "$sp".
>> However, all info is store to "$a2", this mark should be
>>   `UNWIND_HINT sp_reg=ORC_REG_A2(???) type=UNWIND_HINT_TYPE_REGS`.
>> I don't konw why save this function internal PC here by `pcaddi t0, 0`,
>> and I think it is no meaning(, for exception handler, they save last PC
>> by read CSR.ERA). The `kvm_enter_guest` saves registers by
>> "$a2"("$sp" - PT_REGS) beyond stack ("$sp"), it is dangerous if IE
>> is enable. So I wonder if there is really a stacktrace through this 
>> function?
>>
> The stack backtracking issue in switch.S is rather complex because it 
> involves the switching between cpu root-mode and guest-mode:
> Real stack backtracking should be divided into two parts:
> part 1:
>     [<0>] kvm_enter_guest+0x38/0x11c
>     [<0>] kvm_arch_vcpu_ioctl_run+0x26c/0x498 [kvm]
>     [<0>] kvm_vcpu_ioctl+0x200/0xcf8 [kvm]
>     [<0>] sys_ioctl+0x498/0xf00
>     [<0>] do_syscall+0x98/0x1d0
>     [<0>] handle_syscall+0xb8/0x158
>
> part 2:
>     [<0>] kvm_vcpu_block+0x88/0x120 [kvm]
>     [<0>] kvm_vcpu_halt+0x68/0x580 [kvm]
>     [<0>] kvm_emu_idle+0xd4/0xf0 [kvm]
>     [<0>] kvm_handle_gspr+0x7c/0x700 [kvm]
>     [<0>] kvm_handle_exit+0x160/0x270 [kvm]
>     [<0>] kvm_exc_entry+0x104/0x1e4
>
>
> In "part 1", after executing kvm_enter_guest, the cpu switches from 
> root-mode to guest-mode.
> In this case, stack backtracking is indeed very rare.
>
> In "part 2", the cpu switches from the guest-mode to the root-mode,
> and most of the stack backtracking occurs during this phase.
>
> To obtain the longest call chain, we save pc in kvm_enter_guest to 
> pt_regs.csr_era,
> and after restoring the sp of the root-mode cpu in kvm_exc_entry,
> The ORC entry was re-established using "UNWIND_HINT_REGS",
>  and then we obtained the following stack backtrace as we wanted:
>
>     [<0>] kvm_vcpu_block+0x88/0x120 [kvm]
>     [<0>] kvm_vcpu_halt+0x68/0x580 [kvm]
>     [<0>] kvm_emu_idle+0xd4/0xf0 [kvm]
>     [<0>] kvm_handle_gspr+0x7c/0x700 [kvm]
>     [<0>] kvm_handle_exit+0x160/0x270 [kvm]
>     [<0>] kvm_exc_entry+0x104/0x1e4
I found this might be a coincidence—correct behavior due to the incorrect
UNWIND_HINT_REGS mark and unusual stack adjustment.

First, the kvm_enter_guest contains only a single branch instruction, ertn.
It hardware-jump to the CSR.ERA address directly, jump into kvm_exc_entry.

At this point, the stack layout looks like this:
-------------------------------
   frame from call to `kvm_enter_guest`
-------------------------------  <- $sp
   PT_REGS
-------------------------------  <- $a2

Then kvm_exc_entry adjust stack without save any register (e.g. $ra, $sp)
but still marked UNWIND_HINT_REGS.
After the adjustment:
-------------------------------
   frame from call to `kvm_enter_guest`
-------------------------------
   PT_REGS
-------------------------------  <- $a2, new $sp

During unwinding, when the unwinder reaches kvm_exc_entry,
it meets the mark of PT_REGS and correctly recovers
  pc = regs.csr_era, sp = regs.sp, ra = regs.ra

a) Can we avoid "ertn" rather than `jr reg (or jirl ra, reg, 0)`
instead, like call? The kvm_exc_entry cannot back to kvm_enter_guest
if we use "ertn", so should the kvm_enter_guest appear on the stacktrace?

b) Can we adjust $sp before entering kvm_exc_entry? Then we can mark
UNWIND_HINT_REGS at the beginning of kvm_exc_entry, which something
like ret_from_kernel_thread_asm.

> [<0>] kvm_enter_guest+0x38/0x11c
>     [<0>] kvm_arch_vcpu_ioctl_run+0x26c/0x498 [kvm]
>     [<0>] kvm_vcpu_ioctl+0x200/0xcf8 [kvm]
>     [<0>] sys_ioctl+0x498/0xf00
>     [<0>] do_syscall+0x98/0x1d0
>     [<0>] handle_syscall+0xb8/0x158
>
> Doing so is equivalent to ignoring the details of the cpu root-mode 
> and guest-mode switching.
> About what you said in the IE enable phase is dangerous,
> interrupts are always off during the cpu root-mode and guest-mode 
> switching in kvm_enter_guest and kvm_exc_entry.
>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ