[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <8B6C255A-4F2D-4928-BF9D-17F7E3C4BA3E@m.fudan.edu.cn>
Date: Mon, 6 Jan 2025 14:37:18 +0800
From: Kun Hu <huk23@...udan.edu.cn>
To: paulmck@...nel.org
Cc: frederic@...nel.org,
neeraj.upadhyay@...nel.org,
joel@...lfernandes.org,
josh@...htriplett.org,
boqun.feng@...il.com,
urezki@...il.com,
rostedt@...dmis.org,
mathieu.desnoyers@...icios.com,
jiangshanlai@...il.com,
qiang.zhang1211@...il.com,
rcu@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: watchdog: BUG: soft lockup in note_gp_changes in
kernel/rcu/tree.c
> 2025年1月3日 08:16,Paul E. McKenney <paulmck@...nel.org> 写道:
>
> On Thu, Jan 02, 2025 at 10:59:27AM +0800, Kun Hu wrote:
>> Hello,
>>
>> When using our customed fuzzer tool to fuzz the latest Linux kernel, the following crash
>> was triggered.
>>
>> HEAD commit: dbfac60febfa806abb2d384cb6441e77335d2799
>> git tree: upstream
>> Console output: https://drive.google.com/file/d/1D3EDxDxPi0t7m_Z4Uc4FuL26DnHs7yTa/view?usp=sharing
>> Kernel config: https://drive.google.com/file/d/1m1mk_YusR-tyusNHFuRbzdj8KUzhkeHC/view?usp=sharing
>> C reproducer: /
>> Syzlang reproducer: /
>>
>> We observed a crash at line 1333 in note_gp_changes, likely caused by a race condition involving rcu_gp_kthread_wake and note_gp_changes. The issue appears to involve insufficient or incorrect synchronization, as indicated by the involvement of _raw_spin_unlock_irqrestore in spinlock.c. Specifically, this may lead to invalid accesses to rcu_state.gp_kthread or related flags (e.g., gp_flags), potentially resulting in unexpected behavior in swake_up_one_online.
>>
>> Could you please help check if this needs to be addressed?
>
> This is a new one on me.
>
> This is running in a guest OS. Might the underlying hypervisor be
> overloaded? That could result in vCPU preemption and thus in this sort
> of soft lockup.
>
> Also, when I check out the above commit (which is v6.13-rc4), I find that
> line 1333 is the close curly brace of note_gp_changes(). Of course, it is
> possible that the address-to-symbol translation failed (please check!),
> but in the absence of such failure, there is no way that I know of that
> incorrect synchronization could cause a soft lockup at that location.
>
> Other things besides vCPU preemption that could cause a soft lockup at
> that location include corrupted kernel text, corrupted kernel stack,
> and incessant interrupts.
>
> Other thoughts?
>
> Thanx, Paul
>
Sorry for late,
I double-checked that it's not the address-to-symbol translation failing, and the vCPU resources aren't overloaded. Additionally, I tried to reproduce multiple rounds using Syzkaller to get two types of reproducers, c and syscall sequences. i'm not sure if there are any other issues, that's all I can offer for now.
Not sure if this information is useful to you, if it really isn't a real bug, please ignore it.
C reproducer: https://drive.google.com/file/d/1niejFamwXcRumUsn1Ur8xiX2jfZAcown/view?usp=sharing
Syscall sequence reproducer: https://drive.google.com/file/d/1gBfe_WZZeHfrhTlXp5zJfV7be21iGCAC/view?usp=sharing
New log info: https://drive.google.com/file/d/1x7eugPh2RUUF9lOf3s9K64pARkkUE1Qn/view?usp=sharing
----
Thanks,
Kun Hu
Powered by blists - more mailing lists