[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250521095239.0b254e36@gandalf.local.home>
Date: Wed, 21 May 2025 09:52:39 -0400
From: Steven Rostedt <rostedt@...dmis.org>
To: fengtian guo <fengtian_guo@...mail.com>
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>, Vincent Guittot
<vincent.guittot@...aro.org>, Dietmar Eggemann <dietmar.eggemann@....com>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Sebastian
Andrzej Siewior <bigeasy@...utronix.de>
Subject: Re: [PATCH RT] Possible spinlock deadlock in kernel/sched/rt.c
under high load
On Wed, 21 May 2025 10:35:53 +0000
fengtian guo <fengtian_guo@...mail.com> wrote:
> hardward: On arm64 with 32 cores
>
> First Deadlock Root Cause Analysis
> The initial deadlock occurs due to
> unprotected spinlock access between
> an IRQ work thread and a hardware interrupt on the same CPU
> Here is the critical path:
> Deadlock Sequence
> IRQ Work Thread Context (RT priority):
>
> irq_work → rto_push_irq_work_func → raw_spin_lock(&rq->lock) → push_rt_task
> The rto_push_irq_work_func thread acquires rq->lock without disabling interrupts
rto_push_irq_work_func() must be called with interrupts disabled. If it is
not, then that's a bug in the implementation of irq_work!
>
> Hardware Interrupt Context (Clock timer):
> hrtimer_interrupt → __hrtimer_run_queues → _run_hrtimer → hrtimer_wakeup →
> try_to_wake_up → ttwu_queue → raw_spin_lock(&rq->lock)
>
> The clock interrupt preempts the IRQ work thread while it holds rq->lock.
> The interrupt handler attempts to acquire the same rq->lock via ttwu_queue
> , causing a double-lock deadlock.
> Signed-off-by: Fengtian Guo <fengtian_guo@...mail.com>
> ---
> kernel/sched/rt.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index 5dc1ee8dc..52a2e7bce 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -2131,6 +2131,7 @@ void rto_push_irq_work_func(struct irq_work *work)
> container_of(work, struct root_domain, rto_push_work);
> struct rq *rq;
> int cpu;
> + unsigned long flags;
>
> rq = this_rq();
>
> @@ -2139,10 +2140,10 @@ void rto_push_irq_work_func(struct irq_work *work)
> * When it gets updated, a check is made if a push is possible.
> */
> if (has_pushable_tasks(rq)) {
> - raw_spin_lock(&rq->lock);
> + raw_spin_lock_irqsave(&rq->lock, flags);
> while (push_rt_task(rq, true))
> ;
> - raw_spin_unlock(&rq->lock);
> + raw_spin_unlock_irqrestore(&rq->lock, flags);
interrupts should *NEVER* be enabled here!
> }
>
> raw_spin_lock(&rd->rto_lock);
> --
In kernel/sched/topology.c we have:
rd->rto_push_work = IRQ_WORK_INIT_HARD(rto_push_irq_work_func);
That IRQ_WORK_INIT_HARD() means that this function must always be called
from hard interrupt context (or interrupts disabled). Even when PREEMPT_RT
is enabled.
If the irq_work is being called without interrupts disabled, there's a bug
somewhere else.
NACK on this patch, because its fixing a symptom of the bug and not the bug
itself.
The question is, how did this get called as a normal irq_work and not one
that was marked as "HARD"?
-- Steve
Powered by blists - more mailing lists