[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6d478129-324a-4c5a-8258-4abe3d0732d6@ghiti.fr>
Date: Wed, 2 Jul 2025 15:05:06 +0200
From: Alexandre Ghiti <alex@...ti.fr>
To: Yao Zi <ziyao@...root.org>, Andy Chiu <andybnac@...il.com>,
alexghiti@...osinc.com, palmer@...belt.com, Andy Chiu
<andy.chiu@...ive.com>, Björn Töpel
<bjorn@...osinc.com>, Mark Rutland <mark.rutland@....com>,
puranjay12@...il.com, paul.walmsley@...ive.com, greentime.hu@...ive.com,
nick.hu@...ive.com, nylon.chen@...ive.com, eric.lin@...ive.com,
vicent.chen@...ive.com, zong.li@...ive.com, yongxuan.wang@...ive.com,
samuel.holland@...ive.com, olivia.chu@...ive.com, c2232430@...il.com
Cc: Han Gao <rabenda.cn@...il.com>, Vivian Wang <wangruikang@...as.ac.cn>,
linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
regressions@...ts.linux.dev, linux-riscv@...ts.infradead.org
Subject: Re: [REGRESSION] Random oops on SG2042 with Linux 6.16-rc and dynamic
ftrace
Hi Yao,
On 7/2/25 12:50, Yao Zi wrote:
> On Tue, Jul 01, 2025 at 02:27:32PM +0200, Alexandre Ghiti wrote:
>> Hi Yao,
>>
>> On 7/1/25 08:41, Yao Zi wrote:
>>> Linux v6.16 built with dynamic ftrace randomly oops or triggers
>>> ftrace_bug() on Sophgo SG2042 when booting systemd-based userspace,
> ...
>
>>> Not sure either reverting the commits or fixing them up is a better
>>> idea, but anyway the fatal first issue shouidn't go into the stable
>>> release.
>> Let's fix this, we were expecting issues with dynamic ftrace :)
>>
>> So the following diff fixes all the issues you mentioned (not the first
>> crash though, I'll let you test and see if it works better, I don't have
>> this board):
> Thanks for the fix! I've tested it with both QEMU and SG2042, it does
> fix the lockdep failures as well as the boot time crash on SG2042. The
> boot-time crash is caused by the race so will disappear as long as we
> fix the race.
>
>> diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
>> index 4c6c24380cfd9..97ced537aa1e0 100644
>> --- a/arch/riscv/kernel/ftrace.c
>> +++ b/arch/riscv/kernel/ftrace.c
>> @@ -14,6 +14,16 @@
>> #include <asm/text-patching.h>
>>
>> #ifdef CONFIG_DYNAMIC_FTRACE
>> +void ftrace_arch_code_modify_prepare(void)
>> +{
>> + mutex_lock(&text_mutex);
>> +}
>> +
>> +void ftrace_arch_code_modify_post_process(void)
>> +{
>> + mutex_unlock(&text_mutex);
>> +}
>> +
>> unsigned long ftrace_call_adjust(unsigned long addr)
>> {
>> if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS))
>> @@ -29,10 +39,8 @@ unsigned long arch_ftrace_get_symaddr(unsigned long
>> fentry_ip)
>>
>> void arch_ftrace_update_code(int command)
>> {
>> - mutex_lock(&text_mutex);
>> command |= FTRACE_MAY_SLEEP;
>> ftrace_modify_all_code(command);
>> - mutex_unlock(&text_mutex);
>> flush_icache_all();
>> }
>>
>> @@ -149,16 +157,17 @@ int ftrace_init_nop(struct module *mod, struct
>> dyn_ftrace *rec)
>> unsigned int nops[2], offset;
>> int ret;
>>
>> + mutex_lock(&text_mutex);
> Besides using the guard API, could we swap the order between
> ftrace_rec_set_nop_ops() and calculation of the nops array? This shrinks
> the critical region a little.
If you don't mind, I won't, I don't like initializing stuff which could
never be used in case of error.
>
> With or without the change, here's my tag,
>
> Tested-by: Yao Zi <ziyao@...root.org>
>
> and also
>
> Reported-by: Han Gao <rabenda.cn@...il.com>
> Reported-by: Vivian Wang <wangruikang@...as.ac.cn>
>
> for their first-hand report of boot-time crash and analysis for the
> first lock issue.
I'll add all those tags in the patch I'll send today (or tomorrow if the
CI is slow).
Thanks again for the great bug report, really appreciated.
Alex
>
> Regards,
> Yao Zi
>
>> ret = ftrace_rec_set_nop_ops(rec);
>> if (ret)
>> - return ret;
>> + goto end;
>>
>> offset = (unsigned long) &ftrace_caller - pc;
>> nops[0] = to_auipc_t0(offset);
>> nops[1] = RISCV_INSN_NOP4;
>>
>> - mutex_lock(&text_mutex);
>> ret = patch_insn_write((void *)pc, nops, 2 * MCOUNT_INSN_SIZE);
>> +end:
>> mutex_unlock(&text_mutex);
>>
>> return ret;
>>
>> Andy is also taking a look, I'll let him confirm the above fix is correct.
>>
>> Thanks for the thorough report!
>>
>> Alex
>>
>>
>>> Thanks for your suggestions on the problems.
>>>
>>> Regards,
>>> Yao Zi
>>>
>>> [1]: https://lore.kernel.org/all/20250407180838.42877-1-andybnac@gmail.com/
>>>
>>> #regzbot introduced: 881dadf0792c
>>>
>>> _______________________________________________
>>> linux-riscv mailing list
>>> linux-riscv@...ts.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-riscv
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@...ts.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
Powered by blists - more mailing lists