[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20251124080644.3871678-2-sashal@kernel.org>
Date: Mon, 24 Nov 2025 03:06:16 -0500
From: Sasha Levin <sashal@...nel.org>
To: patches@...ts.linux.dev,
stable@...r.kernel.org
Cc: Zqiang <qiang.zhang@...ux.dev>,
Tejun Heo <tj@...nel.org>,
Sasha Levin <sashal@...nel.org>,
mingo@...hat.com,
peterz@...radead.org,
juri.lelli@...hat.com,
vincent.guittot@...aro.org,
bigeasy@...utronix.de,
clrkwllms@...nel.org,
rostedt@...dmis.org,
sched-ext@...ts.linux.dev,
linux-kernel@...r.kernel.org,
linux-rt-devel@...ts.linux.dev
Subject: [PATCH AUTOSEL 6.17] sched_ext: Fix possible deadlock in the deferred_irq_workfn()
From: Zqiang <qiang.zhang@...ux.dev>
[ Upstream commit a257e974210320ede524f340ffe16bf4bf0dda1e ]
For PREEMPT_RT=y kernels, the deferred_irq_workfn() is executed in
the per-cpu irq_work/* task context and not disable-irq, if the rq
returned by container_of() is current CPU's rq, the following scenarios
may occur:
lock(&rq->__lock);
<Interrupt>
lock(&rq->__lock);
This commit use IRQ_WORK_INIT_HARD() to replace init_irq_work() to
initialize rq->scx.deferred_irq_work, make the deferred_irq_workfn()
is always invoked in hard-irq context.
Signed-off-by: Zqiang <qiang.zhang@...ux.dev>
Signed-off-by: Tejun Heo <tj@...nel.org>
Signed-off-by: Sasha Levin <sashal@...nel.org>
---
LLM Generated explanations, may be completely bogus:
## Comprehensive Analysis
### 1. Commit Message and Problem Analysis
The commit addresses a **critical deadlock** scenario in the `sched_ext`
(BPF extensible scheduler) subsystem.
- **The Issue:** On `CONFIG_PREEMPT_RT` (Real-Time) kernels, `irq_work`
items initialized with `init_irq_work()` default to running in a per-
CPU thread context with interrupts enabled. The work function
`deferred_irq_workfn()` acquires the runqueue lock
(`raw_spin_rq_lock(rq)`). If an interrupt occurs while this lock is
held, and the interrupt handler also attempts to acquire `rq->__lock`
(a standard scheduler pattern), a deadlock occurs (A-A deadlock).
- **The Fix:** The commit changes the initialization of
`deferred_irq_work` to use `IRQ_WORK_INIT_HARD()`. This forces the
work function to execute in **hard interrupt context** (with
interrupts disabled), preventing the nested interrupt scenario that
causes the deadlock.
### 2. Deep Code Research & Verification
- **Subsystem Context:** `sched_ext` was merged in Linux v6.12. The
buggy code exists in all stable kernels starting from v6.12.y up to
the current v6.17.y. Older LTS kernels (6.6.y, 6.1.y) do not contain
`sched_ext` and are unaffected.
- **Code Mechanics:**
- **Buggy Code:** `init_irq_work(&rq->scx.deferred_irq_work,
deferred_irq_workfn);` relies on defaults which are unsafe for this
locking pattern on PREEMPT_RT.
- **Corrected Code:** `rq->scx.deferred_irq_work =
IRQ_WORK_INIT_HARD(deferred_irq_workfn);` explicitly sets the
`IRQ_WORK_HARD_IRQ` flag.
- **Precedent:** This pattern is well-established in the scheduler
core (e.g., `rto_push_work` in `kernel/sched/topology.c` uses
`IRQ_WORK_INIT_HARD` for the exact same reason).
- **Correctness:** `deferred_irq_workfn` calls `run_deferred`, which
uses `raw_spin_rq_lock`. These locks are safe to take in hard-irq
context. The fix is technically sound and adheres to locking rules.
### 3. Stable Kernel Rules Evaluation
- **Fixes a real bug?** **Yes.** It fixes a reproducible deadlock that
causes system hangs.
- **Important issue?** **Yes.** Deadlocks are critical failures,
especially on Real-Time systems where reliability is paramount.
- **Obviously correct?** **Yes.** The fix is a one-line change using a
standard kernel macro specifically designed for this purpose.
- **Small and contained?** **Yes.** One line changed, no external
dependencies.
- **No new features?** **Yes.** This is a pure bug fix for existing
functionality.
### 4. Risk Assessment
- **Regression Risk:** **Very Low.** The change only affects the
execution context of the work item. On non-RT kernels, `irq_work`
often runs in hard-irq context anyway, so the behavior change is
minimal. On RT kernels, this restores the required behavior to prevent
deadlocks.
- **User Impact:** Users running `sched_ext` on Real-Time kernels are at
risk of random system freezes without this fix.
### Conclusion
This commit is a textbook example of stable-material. It fixes a severe
bug (deadlock) in a supported feature (`sched_ext`) using a minimal,
well-understood solution. While it lacks a "Cc: stable" tag, the nature
of the bug (deadlock) and the surgical nature of the fix make it a
mandatory backport for all stable trees containing `sched_ext` (v6.12+).
**YES**
kernel/sched/ext.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index e1b502ef1243c..fa64fdb6e9796 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -5280,7 +5280,7 @@ void __init init_sched_ext_class(void)
BUG_ON(!zalloc_cpumask_var_node(&rq->scx.cpus_to_kick_if_idle, GFP_KERNEL, n));
BUG_ON(!zalloc_cpumask_var_node(&rq->scx.cpus_to_preempt, GFP_KERNEL, n));
BUG_ON(!zalloc_cpumask_var_node(&rq->scx.cpus_to_wait, GFP_KERNEL, n));
- init_irq_work(&rq->scx.deferred_irq_work, deferred_irq_workfn);
+ rq->scx.deferred_irq_work = IRQ_WORK_INIT_HARD(deferred_irq_workfn);
init_irq_work(&rq->scx.kick_cpus_irq_work, kick_cpus_irq_workfn);
if (cpu_online(cpu))
--
2.51.0
Powered by blists - more mailing lists