[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <26036193-f570-3a17-e6d3-45ad70704198@loongson.cn>
Date: Fri, 12 Sep 2025 09:55:32 +0800
From: Jinyang He <hejinyang@...ngson.cn>
To: Tiezhu Yang <yangtiezhu@...ngson.cn>
Cc: Josh Poimboeuf <jpoimboe@...nel.org>, Huacai Chen
<chenhuacai@...nel.org>, Xi Zhang <zhangxi@...inos.cn>,
live-patching@...r.kernel.org, loongarch@...ts.linux.dev,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1 2/2] LoongArch: Return 0 for user tasks in
arch_stack_walk_reliable()
On 2025-09-11 19:49, Tiezhu Yang wrote:
> On 2025/9/10 上午9:11, Jinyang He wrote:
>> On 2025-09-09 19:31, Tiezhu Yang wrote:
>>
>>> When testing the kernel live patching with "modprobe livepatch-sample",
>>> there is a timeout over 15 seconds from "starting patching transition"
>>> to "patching complete", dmesg shows "unreliable stack" for user tasks
>>> in debug mode. When executing "rmmod livepatch-sample", there exists
>>> the similar issue.
>
> ...
>
>>> @@ -57,9 +62,14 @@ int
>>> arch_stack_walk_reliable(stack_trace_consume_fn consume_entry,
>>> }
>>> regs->regs[1] = 0;
>>> regs->regs[22] = 0;
>>> + regs->csr_prmd = task->thread.csr_prmd;
>>> for (unwind_start(&state, task, regs);
>>> !unwind_done(&state) && !unwind_error(&state);
>>> unwind_next_frame(&state)) {
>>> + /* Success path for user tasks */
>>> + if (user_mode(regs))
>>> + return 0;
>>> +
>>> addr = unwind_get_return_address(&state);
>>> /*
>> Hi, Tiezhu,
>>
>> We update stack info by get_stack_info when meet ORC_TYPE_REGS in
>> unwind_next_frame. And in arch_stack_walk(_reliable), we always
>> do unwind_done before unwind_next_frame. So is there anything
>> error in get_stack_info which causing regs is user_mode while
>> stack is not STACK_TYPE_UNKNOWN?
>
> When testing the kernel live patching, the error code path in
> unwind_next_frame() is:
>
> switch (orc->fp_reg) {
> case ORC_REG_PREV_SP:
> p = (unsigned long *)(state->sp + orc->fp_offset);
> if (!stack_access_ok(state, (unsigned long)p,
> sizeof(unsigned long)))
> goto err;
>
> for this case, get_stack_info() does not return 0 due to in_task_stack()
> is not true, then goto error, state->stack_info.type = STACK_TYPE_UNKNOWN
> and state->error = true. In arch_stack_walk_reliable(), the loop will be
> break and it returns -EINVAL, thus causing unreliable stack.
The stop position of a complete stack backtrace on LoongArch should be
the top of the task stack or until the address is_entry_func.
Otherwise, it is not a complete stack backtrace, and thus I think it
is an "unreliable stack".
I'm curious about what the ORC info at this PC.
Powered by blists - more mailing lists