linux-kernel - Re: [QUESTION] problems report: rcu_read_unlock_special() called in irq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0f322c7b-d8b7-ff0a-5c98-26230a9fbad0@huawei.com>
Date: Wed, 4 Jun 2025 11:25:50 +0800
From: Xiongfeng Wang <wangxiongfeng2@...wei.com>
To: Joel Fernandes <joelagnelf@...dia.com>
CC: Joel Fernandes <joel@...lfernandes.org>, <ankur.a.arora@...cle.com>,
	Frederic Weisbecker <frederic@...nel.org>, "Paul E . McKenney"
	<paulmck@...nel.org>, Boqun Feng <boqun.feng@...il.com>,
	<neeraj.upadhyay@...nel.org>, <urezki@...il.com>, <rcu@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, <xiqi2@...wei.com>, "Wangshaobo (bobo)"
	<bobo.shaobowang@...wei.com>
Subject: Re: [QUESTION] problems report: rcu_read_unlock_special() called in
 irq_exit() causes dead loop



On 2025/6/4 9:35, Joel Fernandes wrote:
> On Tue, Jun 03, 2025 at 03:22:42PM -0400, Joel Fernandes wrote:
>>
>>
>> On 6/3/2025 3:03 PM, Joel Fernandes wrote:
>>>
>>>
>>> On 6/3/2025 2:59 PM, Joel Fernandes wrote:
>>>> On Fri, May 30, 2025 at 09:55:45AM +0800, Xiongfeng Wang wrote:
>>>>> Hi Joel,
>>>>>
>>>>> On 2025/5/29 0:30, Joel Fernandes wrote:
>>>>>> On Wed, May 21, 2025 at 5:43 AM Xiongfeng Wang
>>>>>> <wangxiongfeng2@...wei.com> wrote:
>>>>>>>
>>>>>>> Hi RCU experts,
>>>>>>>
>>>>>>> When I ran syskaller in Linux 6.6 with CONFIG_PREEMPT_RCU enabled, I got
>>>>>>> the following soft lockup. The Calltrace is too long. I put it in the end.
>>>>>>> The issue can also be reproduced in the latest kernel.
>>>>>>>
>>>>>>> The issue is as follows. CPU3 is waiting for a spin_lock, which is got by CPU1.
>>>>>>> But CPU1 stuck in the following dead loop.
>>>>>>>
>>>>>>> irq_exit()
>>>>>>>   __irq_exit_rcu()
>>>>>>>     /* in_hardirq() returns false after this */
>>>>>>>     preempt_count_sub(HARDIRQ_OFFSET)
>>>>>>>     tick_irq_exit()
>>>>>>>       tick_nohz_irq_exit()
>>>>>>>             tick_nohz_stop_sched_tick()
>>>>>>>               trace_tick_stop()  /* a bpf prog is hooked on this trace point */
>>>>>>>                    __bpf_trace_tick_stop()
>>>>>>>                       bpf_trace_run2()
>>>>>>>                             rcu_read_unlock_special()
>>>>>>>                               /* will send a IPI to itself */
>>>>>>>                               irq_work_queue_on(&rdp->defer_qs_iw, rdp->cpu);
>>>>>>>
>>>>>>> /* after interrupt is enabled again, the irq_work is called */
>>>>>>> asm_sysvec_irq_work()
>>>>>>>   sysvec_irq_work()
>>>>>>> irq_exit() /* after handled the irq_work, we again enter into irq_exit() */
>>>>>>>   __irq_exit_rcu()
>>>>>>>     ...skip...
>>>>>>>            /* we queue a irq_work again, and enter a dead loop */
>>>>>>>            irq_work_queue_on(&rdp->defer_qs_iw, rdp->cpu);
>>>>>>
> 
> The following is a candidate fix (among other fixes being
> considered/discussed). The change is to check if context tracking thinks
> we're in IRQ and if so, avoid the irq_work. IMO, this should be rare enough
> that it shouldn't be an issue and it is dangerous to self-IPI consistently
> while we're exiting an IRQ anyway.
> 
> Thoughts? Xiongfeng, do you want to try it?

Thanks a lot for the fast response. My colleague is testing the modification.
 She will feedback the result.

Thanks,
Xiongfeng

> 
> Btw, I could easily reproduce it as a boot hang by doing:
> 
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -638,6 +638,10 @@ void irq_enter(void)
>  
>  static inline void tick_irq_exit(void)
>  {
> +	rcu_read_lock();
> +	WRITE_ONCE(current->rcu_read_unlock_special.b.need_qs, true);
> +	rcu_read_unlock();
> +
>  #ifdef CONFIG_NO_HZ_COMMON
>  	int cpu = smp_processor_id();
>  
> ---8<-----------------------
> 
> From: Joel Fernandes <joelagnelf@...dia.com>
> Subject: [PATCH] Do not schedule irq_work when IRQ is exiting
> 
> Signed-off-by: Joel Fernandes <joelagnelf@...dia.com>
> ---
>  include/linux/context_tracking_irq.h |  2 ++
>  kernel/context_tracking.c            | 12 ++++++++++++
>  kernel/rcu/tree_plugin.h             |  3 ++-
>  3 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/context_tracking_irq.h b/include/linux/context_tracking_irq.h
> index 197916ee91a4..35a5ad971514 100644
> --- a/include/linux/context_tracking_irq.h
> +++ b/include/linux/context_tracking_irq.h
> @@ -9,6 +9,7 @@ void ct_irq_enter_irqson(void);
>  void ct_irq_exit_irqson(void);
>  void ct_nmi_enter(void);
>  void ct_nmi_exit(void);
> +bool ct_in_irq(void);
>  #else
>  static __always_inline void ct_irq_enter(void) { }
>  static __always_inline void ct_irq_exit(void) { }
> @@ -16,6 +17,7 @@ static inline void ct_irq_enter_irqson(void) { }
>  static inline void ct_irq_exit_irqson(void) { }
>  static __always_inline void ct_nmi_enter(void) { }
>  static __always_inline void ct_nmi_exit(void) { }
> +static inline bool ct_in_irq(void) { return false; }
>  #endif
>  
>  #endif
> diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
> index fb5be6e9b423..8e8055cf04af 100644
> --- a/kernel/context_tracking.c
> +++ b/kernel/context_tracking.c
> @@ -392,6 +392,18 @@ noinstr void ct_irq_exit(void)
>  	ct_nmi_exit();
>  }
>  
> +/**
> + * ct_in_irq - check if CPU is currently in a tracked IRQ context.
> + *
> + * Returns true if ct_irq_enter() has been called and ct_irq_exit()
> + * has not yet been called. This indicates the CPU is currently
> + * processing an interrupt.
> + */
> +bool ct_in_irq(void)
> +{
> +	return ct_nmi_nesting() != 0;
> +}
> +
>  /*
>   * Wrapper for ct_irq_enter() where interrupts are enabled.
>   *
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 3c0bbbbb686f..a3eebd4c841e 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -673,7 +673,8 @@ static void rcu_read_unlock_special(struct task_struct *t)
>  			set_tsk_need_resched(current);
>  			set_preempt_need_resched();
>  			if (IS_ENABLED(CONFIG_IRQ_WORK) && irqs_were_disabled &&
> -			    expboost && !rdp->defer_qs_iw_pending && cpu_online(rdp->cpu)) {
> +			    expboost && !rdp->defer_qs_iw_pending && cpu_online(rdp->cpu) &&
> +			    !ct_in_irq()) {
>  				// Get scheduler to re-evaluate and call hooks.
>  				// If !IRQ_WORK, FQS scan will eventually IPI.
>  				if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) &&
>