linux-kernel - Re: watchdog: BUG: soft lockup in note_gp

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-Id: <8B6C255A-4F2D-4928-BF9D-17F7E3C4BA3E@m.fudan.edu.cn>
Date: Mon, 6 Jan 2025 14:37:18 +0800
From: Kun Hu <huk23@...udan.edu.cn>
To: paulmck@...nel.org
Cc: frederic@...nel.org,
 neeraj.upadhyay@...nel.org,
 joel@...lfernandes.org,
 josh@...htriplett.org,
 boqun.feng@...il.com,
 urezki@...il.com,
 rostedt@...dmis.org,
 mathieu.desnoyers@...icios.com,
 jiangshanlai@...il.com,
 qiang.zhang1211@...il.com,
 rcu@...r.kernel.org,
 linux-kernel@...r.kernel.org
Subject: Re: watchdog: BUG: soft lockup in note_gp_changes in
 kernel/rcu/tree.c



> 2025年1月3日 08:16，Paul E. McKenney <paulmck@...nel.org> 写道：
> 
> On Thu, Jan 02, 2025 at 10:59:27AM +0800, Kun Hu wrote:
>> Hello,
>> 
>> When using our customed fuzzer tool to fuzz the latest Linux kernel, the following crash
>> was triggered.
>> 
>> HEAD commit: dbfac60febfa806abb2d384cb6441e77335d2799
>> git tree: upstream
>> Console output: https://drive.google.com/file/d/1D3EDxDxPi0t7m_Z4Uc4FuL26DnHs7yTa/view?usp=sharing
>> Kernel config: https://drive.google.com/file/d/1m1mk_YusR-tyusNHFuRbzdj8KUzhkeHC/view?usp=sharing
>> C reproducer: /
>> Syzlang reproducer: /
>> 
>> We observed a crash at line 1333 in note_gp_changes, likely caused by a race condition involving rcu_gp_kthread_wake and note_gp_changes. The issue appears to involve insufficient or incorrect synchronization, as indicated by the involvement of _raw_spin_unlock_irqrestore in spinlock.c. Specifically, this may lead to invalid accesses to rcu_state.gp_kthread or related flags (e.g., gp_flags), potentially resulting in unexpected behavior in swake_up_one_online.
>> 
>> Could you please help check if this needs to be addressed?
> 
> This is a new one on me.
> 
> This is running in a guest OS.  Might the underlying hypervisor be
> overloaded?  That could result in vCPU preemption and thus in this sort
> of soft lockup.
> 
> Also, when I check out the above commit (which is v6.13-rc4), I find that
> line 1333 is the close curly brace of note_gp_changes().  Of course, it is
> possible that the address-to-symbol translation failed (please check!),
> but in the absence of such failure, there is no way that I know of that
> incorrect synchronization could cause a soft lockup at that location.
> 
> Other things besides vCPU preemption that could cause a soft lockup at
> that location include corrupted kernel text, corrupted kernel stack,
> and incessant interrupts.
> 
> Other thoughts?
> 
> Thanx, Paul
> 

Sorry for late, 

I double-checked that it's not the address-to-symbol translation failing, and the vCPU resources aren't overloaded. Additionally, I tried to reproduce multiple rounds using Syzkaller to get two types of reproducers, c and syscall sequences. i'm not sure if there are any other issues, that's all I can offer for now.

Not sure if this information is useful to you, if it really isn't a real bug, please ignore it.

C reproducer: https://drive.google.com/file/d/1niejFamwXcRumUsn1Ur8xiX2jfZAcown/view?usp=sharing
Syscall sequence reproducer: https://drive.google.com/file/d/1gBfe_WZZeHfrhTlXp5zJfV7be21iGCAC/view?usp=sharing
New log info: https://drive.google.com/file/d/1x7eugPh2RUUF9lOf3s9K64pARkkUE1Qn/view?usp=sharing

----
Thanks,
Kun Hu