[<prev] [next>] [day] [month] [year] [list]
Message-Id: <20251009-recheck_rt_task_enqueue_state-v1-1-5f9c96d3c4fd@oss.qualcomm.com>
Date: Thu, 09 Oct 2025 00:23:55 -0700
From: Tengfei Fan <tengfei.fan@....qualcomm.com>
To: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
linux-arm-msm@...r.kernel.org
Cc: kernel@....qualcomm.com, linux-kernel@...r.kernel.org,
Tengfei Fan <tengfei.fan@....qualcomm.com>
Subject: [PATCH] sched: Recheck the rt task's on rq state after
double_lock_balance()
Recheck whether next_task is still in the runqueue of this_rq after
locking this_rq and lowest_rq via double_lock_balance() in
push_rt_task(). This is necessary because double_lock_balance() first
releases this_rq->lock and then attempts to acquire both this_rq->lock
and lowest_rq->lock, during which next_task may have already been
removed from this_rq's runqueue, leading to a double dequeue issue.
The double dequeue issue can occur in the following scenario:
1. Core0 call stack:
autoremove_wake_function
default_wake_function
try_to_wake_up
ttwu_do_activate
task_woken_rt
push_rt_task
move_queued_task_locked
dequeue_task
__wake_up
2. Execution flow on Core0, Core1 and Core2(Core0, Core1 and Core2 are
contending for Core1's rq->lock):
- Core1: enqueue next_task on Core1
- Core0: lock Core1's rq->lock
next_task = pick_next_pushable_task()
unlock Core1's rq->lock via double_lock_balance()
- Core1: lock Core1's rq->lock
next_task = pick_next_task()
unlock Core1's rq->lock
- Core2: lock Core1's rq->lock in migration thread
- Core1: running next_task
- Core2: unlock Core1's rq->lock
- Core1: lock Core1's rq->lock
switches out and dequeue next_task
unlock Core1's rq->lock
- Core0: relock Core1's rq->lock from double_lock_balance()
try to relock Core1's rq->lock from double_lock_balance()
but next_task has been dequeued from Core1, causing the issue
Signed-off-by: Tengfei Fan <tengfei.fan@....qualcomm.com>
---
kernel/sched/rt.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 7936d4333731..b4e44317a5de 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2037,6 +2037,14 @@ static int push_rt_task(struct rq *rq, bool pull)
goto retry;
}
+ /* Within find_lock_lowest_rq(), it's possible to first unlock the
+ * rq->lock of the runqueue containing next_task, and the re->lock
+ * it. During this window, the state of next_task might have change.
+ */
+ if (unlikely(rq != task_rq(next_task) ||
+ !task_on_rq_queued(next_task)))
+ goto out;
+
move_queued_task_locked(rq, lowest_rq, next_task);
resched_curr(lowest_rq);
ret = 1;
---
base-commit: 7c3ba4249a3604477ea9c077e10089ba7ddcaa03
change-id: 20251008-recheck_rt_task_enqueue_state-e159aa6a2749
Best regards,
--
Tengfei Fan <tengfei.fan@....qualcomm.com>
Powered by blists - more mailing lists