[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250918150528.292620-1-sieberf@amazon.com>
Date: Thu, 18 Sep 2025 17:05:28 +0200
From: Fernand Sieber <sieberf@...zon.com>
To: <peterz@...radead.org>
CC: <bsegall@...gle.com>, <dietmar.eggemann@....com>, <dwmw@...zon.co.uk>,
<graf@...zon.com>, <jschoenh@...zon.de>, <juri.lelli@...hat.com>,
<linux-kernel@...r.kernel.org>, <mingo@...hat.com>, <sieberf@...zon.com>,
<tanghui20@...wei.com>, <vincent.guittot@...aro.org>,
<vineethr@...ux.ibm.com>, <wangtao554@...wei.com>, <zhangqiao22@...wei.com>
Subject: [PATCH v3] sched/fair: Forfeit vruntime on yield
If a task yields, the scheduler may decide to pick it again. The task in
turn may decide to yield immediately or shortly after, leading to a tight
loop of yields.
If there's another runnable task as this point, the deadline will be
increased by the slice at each loop. This can cause the deadline to runaway
pretty quickly, and subsequent elevated run delays later on as the task
doesn't get picked again. The reason the scheduler can pick the same task
again and again despite its deadline increasing is because it may be the
only eligible task at that point.
Fix this by making the task forfeiting its remaining vruntime and pushing
the deadline one slice ahead. This implements yield behavior more
authentically.
We limit the forfeiting to eligible tasks. This is because core scheduling
prefers running ineligible tasks rather than force idling. As such, without
the condition, we can end up on a yield loop which makes the vruntime
increase rapidly, leading to anomalous run delays later down the line.
Fixes: 147f3efaa24182 ("sched/fair: Implement an EEVDF-like scheduling policy")
Link: https://lore.kernel.org/r/20250401123622.584018-1-sieberf@amazon.com
Link: https://lore.kernel.org/r/20250911095113.203439-1-sieberf@amazon.com
Signed-off-by: Fernand Sieber <sieberf@...zon.com>
Changes in v2:
- Implement vruntime forfeiting approach suggested by Peter Zijlstra
- Updated commit name
- Previous Reviewed-by tags removed due to algorithm change
Changes in v3:
- Only increase vruntime for eligible tasks to avoid runaway vruntime with
core scheduling
Link: https://lore.kernel.org/r/20250916140228.452231-1-sieberf@amazon.com
---
kernel/sched/fair.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b173a059315c..46e5a976f402 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8921,7 +8921,19 @@ static void yield_task_fair(struct rq *rq)
*/
rq_clock_skip_update(rq);
- se->deadline += calc_delta_fair(se->slice, se);
+ /*
+ * Forfeit the remaining vruntime, only if the entity is eligible. This
+ * condition is necessary because in core scheduling we prefer to run
+ * ineligible tasks rather than force idling. If this happens we may
+ * end up in a loop where the core scheduler picks the yielding task,
+ * which yields immediately again; without the condition the vruntime
+ * ends up quickly running away.
+ */
+ if (entity_eligible(cfs_rq, se)) {
+ se->vruntime = se->deadline;
+ se->deadline += calc_delta_fair(se->slice, se);
+ update_min_vruntime(cfs_rq);
+ }
}
static bool yield_to_task_fair(struct rq *rq, struct task_struct *p)
--
2.34.1
Amazon Development Centre (South Africa) (Proprietary) Limited
29 Gogosoa Street, Observatory, Cape Town, Western Cape, 7925, South Africa
Registration Number: 2004 / 034463 / 07
Powered by blists - more mailing lists