[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250402124712.GN25239@noisy.programming.kicks-ass.net>
Date: Wed, 2 Apr 2025 14:47:12 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Harshit Agarwal <harshit@...anix.com>
Cc: Ingo Molnar <mingo@...hat.com>, Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Valentin Schneider <vschneid@...hat.com>,
linux-kernel@...r.kernel.org, Jon Kohler <jon@...anix.com>,
Gauri Patwardhan <gauri.patwardhan@...anix.com>,
Rahul Chunduru <rahul.chunduru@...anix.com>,
Will Ton <william.ton@...anix.com>, stable@...r.kernel.org
Subject: Re: [PATCH v3] sched/rt: Fix race in push_rt_task
On Tue, Feb 25, 2025 at 06:05:53PM +0000, Harshit Agarwal wrote:
> Details
> =======
> Let's look at the following scenario to understand this race.
>
> 1) CPU A enters push_rt_task
> a) CPU A has chosen next_task = task p.
> b) CPU A calls find_lock_lowest_rq(Task p, CPU Z’s rq).
> c) CPU A identifies CPU X as a destination CPU (X < Z).
> d) CPU A enters double_lock_balance(CPU Z’s rq, CPU X’s rq).
> e) Since X is lower than Z, CPU A unlocks CPU Z’s rq. Someone else has
> locked CPU X’s rq, and thus, CPU A must wait.
>
> 2) At CPU Z
> a) Previous task has completed execution and thus, CPU Z enters
> schedule, locks its own rq after CPU A releases it.
> b) CPU Z dequeues previous task and begins executing task p.
> c) CPU Z unlocks its rq.
> d) Task p yields the CPU (ex. by doing IO or waiting to acquire a
> lock) which triggers the schedule function on CPU Z.
> e) CPU Z enters schedule again, locks its own rq, and dequeues task p.
> f) As part of dequeue, it sets p.on_rq = 0 and unlocks its rq.
>
> 3) At CPU B
> a) CPU B enters try_to_wake_up with input task p.
> b) Since CPU Z dequeued task p, p.on_rq = 0, and CPU B updates
> B.state = WAKING.
> c) CPU B via select_task_rq determines CPU Y as the target CPU.
>
> 4) The race
> a) CPU A acquires CPU X’s lock and relocks CPU Z.
> b) CPU A reads task p.cpu = Z and incorrectly concludes task p is
> still on CPU Z.
> c) CPU A failed to notice task p had been dequeued from CPU Z while
> CPU A was waiting for locks in double_lock_balance. If CPU A knew
> that task p had been dequeued, it would return NULL forcing
> push_rt_task to give up the task p's migration.
> d) CPU B updates task p.cpu = Y and calls ttwu_queue.
> e) CPU B locks Ys rq. CPU B enqueues task p onto Y and sets task
> p.on_rq = 1.
> f) CPU B unlocks CPU Y, triggering memory synchronization.
> g) CPU A reads task p.on_rq = 1, cementing its assumption that task p
> has not migrated.
> h) CPU A decides to migrate p to CPU X.
>
> This leads to A dequeuing p from Y's queue and various crashes down the
> line.
>
> Solution
> ========
> The solution here is fairly simple. After obtaining the lock (at 4a),
> the check is enhanced to make sure that the task is still at the head of
> the pushable tasks list. If not, then it is anyway not suitable for
> being pushed out.
>
> Testing
> =======
> The fix is tested on a cluster of 3 nodes, where the panics due to this
> are hit every couple of days. A fix similar to this was deployed on such
> cluster and was stable for more than 30 days.
>
> Co-developed-by: Jon Kohler <jon@...anix.com>
> Signed-off-by: Jon Kohler <jon@...anix.com>
> Co-developed-by: Gauri Patwardhan <gauri.patwardhan@...anix.com>
> Signed-off-by: Gauri Patwardhan <gauri.patwardhan@...anix.com>
> Co-developed-by: Rahul Chunduru <rahul.chunduru@...anix.com>
> Signed-off-by: Rahul Chunduru <rahul.chunduru@...anix.com>
> Signed-off-by: Harshit Agarwal <harshit@...anix.com>
> Tested-by: Will Ton <william.ton@...anix.com>
> Reviewed-by: Steven Rostedt (Google) <rostedt@...dmis.org>
> Cc: stable@...r.kernel.org
> ---
Thanks, I've picked this up to land after -rc1.
Powered by blists - more mailing lists