linux-kernel - Re: [QUESTION] problems report: rcu_read_unlock_special() called in irq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <09e4d018-3db4-404e-a8f0-041cdee15a62@huawei.com>
Date: Tue, 1 Jul 2025 17:20:45 +0800
From: Qi Xi <xiqi2@...wei.com>
To: Joel Fernandes <joelagnelf@...dia.com>, <paulmck@...nel.org>, "Xiongfeng
 Wang" <wangxiongfeng2@...wei.com>
CC: Joel Fernandes <joel@...lfernandes.org>, <ankur.a.arora@...cle.com>,
	Frederic Weisbecker <frederic@...nel.org>, Boqun Feng <boqun.feng@...il.com>,
	<neeraj.upadhyay@...nel.org>, <urezki@...il.com>, <rcu@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, "Wangshaobo (bobo)"
	<bobo.shaobowang@...wei.com>, Xie XiuQi <xiexiuqi@...wei.com>
Subject: Re: [QUESTION] problems report: rcu_read_unlock_special() called in
 irq_exit() causes dead loop

Hello everyone,

Friendly ping about this problem :)

Qi

On 2025/6/6 2:56, Joel Fernandes wrote:
>
> On 6/4/2025 8:26 AM, Paul E. McKenney wrote:
>>>>>>>> Or just don't send subsequent self-IPIs if we just sent one for the
>>>>>>>> rdp. Chances are, if we did not get the scheduler's attention during
>>>>>>>> the first one, we may not in subsequent ones I think. Plus we do send
>>>>>>>> other IPIs already if the grace period was over extended (from the FQS
>>>>>>>> loop), maybe we can tweak that?
>>>>>>> Thanks a lot for your reply. I think it's hard for me to fix this issue as
>>>>>>> above without introducing new bugs. I barely understand the RCU code. But I'm
>>>>>>> very glad to help test if you have any code modifiction need to. I have
>>>>>>> the VM and the syskaller benchmark which can reproduce the problem.
>>>>>> Sure, I understand. This is already incredibly valuable so thank you again.
>>>>>> Will request for your testing help soon. I also have a test module now which
>>>>>> can sort-off reproduce this. Keep you posted!
>>>>> Oh sorry I meant to ask - could you provide the full kernel log and also is
>>>>> there a standalone reproducer syzcaller binary one can run to reproduce it in a VM?
>>> Sorry, I communicate with the teams who maintain the syzkaller tools. He said
>>> I can't send the syskaller binary out of the company. Sorry, but I can help to
>>> reproduce. It's not complicate and not time consuming.
>>>
>>> I found the origin log which use kernel v6.6. But it's not complete.
>>> Then I reprouce the problem using the latest kernel.
>>> Both logs are attached as attachments.
>>>
>> Looking at both the v6.6 version and Joel's fix, I am forced to conclude
>> that this bug has been there for a very long time.  Thank you for your
>> testing efforts and Joel for the fix!
> Thanks. I am still working on polishing the fix Xiongfeng tested. I hope to have
> it out next week for review. As we discussed I will split the context-tracking
> API into a separate patch and will also add a separate documentation
> comment-patch on why we need the irq_work.
>
> thanks,
>
>   - Joel