linux-kernel - Re: [PATCH RT] Possible spinlock deadlock in kernel/sched/rt.c under high load

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250521095239.0b254e36@gandalf.local.home>
Date: Wed, 21 May 2025 09:52:39 -0400
From: Steven Rostedt <rostedt@...dmis.org>
To: fengtian guo <fengtian_guo@...mail.com>
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
 Juri Lelli <juri.lelli@...hat.com>, Vincent Guittot
 <vincent.guittot@...aro.org>, Dietmar Eggemann <dietmar.eggemann@....com>,
 Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
 "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Sebastian
 Andrzej Siewior <bigeasy@...utronix.de>
Subject: Re: [PATCH RT] Possible spinlock deadlock in kernel/sched/rt.c
 under high load

On Wed, 21 May 2025 10:35:53 +0000
fengtian guo <fengtian_guo@...mail.com> wrote:

> hardward: On arm64 with 32 cores
> 
> First Deadlock Root Cause Analysis
> The initial deadlock occurs due to
> unprotected spinlock access between
> an IRQ work thread and a hardware interrupt on the same CPU
> Here is the critical path:
> Deadlock Sequence
> IRQ Work Thread Context (RT priority):
> 
> irq_work → rto_push_irq_work_func → raw_spin_lock(&rq->lock) → push_rt_task
> The rto_push_irq_work_func thread acquires rq->lock without disabling interrupts

rto_push_irq_work_func() must be called with interrupts disabled. If it is
not, then that's a bug in the implementation of irq_work!

> 
> Hardware Interrupt Context (Clock timer):
> hrtimer_interrupt → __hrtimer_run_queues → _run_hrtimer → hrtimer_wakeup →
> try_to_wake_up → ttwu_queue → raw_spin_lock(&rq->lock)
> 
> The clock interrupt preempts the IRQ work thread while it holds rq->lock.
> The interrupt handler attempts to acquire the same rq->lock via ttwu_queue
> , causing a double-lock deadlock.



> Signed-off-by: Fengtian Guo <fengtian_guo@...mail.com>
> ---
>  kernel/sched/rt.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index 5dc1ee8dc..52a2e7bce 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -2131,6 +2131,7 @@ void rto_push_irq_work_func(struct irq_work *work)
>                 container_of(work, struct root_domain, rto_push_work);
>         struct rq *rq;
>         int cpu;
> +       unsigned long flags;
> 
>         rq = this_rq();
> 
> @@ -2139,10 +2140,10 @@ void rto_push_irq_work_func(struct irq_work *work)
>          * When it gets updated, a check is made if a push is possible.
>          */
>         if (has_pushable_tasks(rq)) {
> -               raw_spin_lock(&rq->lock);
> +               raw_spin_lock_irqsave(&rq->lock, flags);
>                 while (push_rt_task(rq, true))
>                         ;
> -               raw_spin_unlock(&rq->lock);
> +               raw_spin_unlock_irqrestore(&rq->lock, flags);

interrupts should *NEVER* be enabled here!

>         }
> 
>         raw_spin_lock(&rd->rto_lock);
> --

In kernel/sched/topology.c we have:

	rd->rto_push_work = IRQ_WORK_INIT_HARD(rto_push_irq_work_func);

That IRQ_WORK_INIT_HARD() means that this function must always be called
from hard interrupt context (or interrupts disabled). Even when PREEMPT_RT
is enabled.

If the irq_work is being called without interrupts disabled, there's a bug
somewhere else.

NACK on this patch, because its fixing a symptom of the bug and not the bug
itself.

The question is, how did this get called as a normal irq_work and not one
that was marked as "HARD"?

-- Steve