[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20240115105052.398761-1-khorenko@virtuozzo.com>
Date: Mon, 15 Jan 2024 13:50:52 +0300
From: Konstantin Khorenko <khorenko@...tuozzo.com>
To: Vincent Guittot <vincent.guittot@...aro.org>
Cc: Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>,
Mel Gorman <mgorman@...e.de>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Valentin Schneider <vschneid@...hat.com>,
Alexander Atanasov <alexander.atanasov@...tuozzo.com>,
linux-kernel@...r.kernel.org,
Konstantin Khorenko <khorenko@...tuozzo.com>
Subject: [PATCH v2 RESEND] sched/fair: Do not scan non-movable tasks several times
If busiest rq is small, nr_running < SCHED_NR_MIGRATE_BREAK and all
tasks are not movable, detach_tasks() should not iterate more than tasks
available in the busiest rq.
Before commit: b0defa7ae03e ("sched/fair: Make sure to try to detach at
least one movable task"), the (env->loop > env->loop_max) condition
prevented us from scanning non-movable tasks more than rq size times,
but after we start checking the LBF_ALL_PINNED flag, the "all tasks are
not movable" case is under threat.
Note: in case all tasks in the rq could not be moved in detach_tasks()
we always increase loop_break by SCHED_NR_MIGRATE_BREAK, so we can step
over loop_max, but i think it's a rare case and does not worth adding
here extra check for rq->nr_running overlimit.
Fixes: b0defa7ae03e ("sched/fair: Make sure to try to detach at least
one movable task")
Signed-off-by: Konstantin Khorenko <khorenko@...tuozzo.com>
---
kernel/sched/fair.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 533547e3c90a..920fb16e6e2f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -11277,7 +11277,6 @@ static int load_balance(int this_cpu, struct rq *this_rq,
.dst_rq = this_rq,
.dst_grpmask = group_balance_mask(sd->groups),
.idle = idle,
- .loop_break = SCHED_NR_MIGRATE_BREAK,
.cpus = cpus,
.fbq_type = all,
.tasks = LIST_HEAD_INIT(env.tasks),
@@ -11324,6 +11323,14 @@ static int load_balance(int this_cpu, struct rq *this_rq,
*/
env.loop_max = min(sysctl_sched_nr_migrate, busiest->nr_running);
+more_balance_reset_break:
+ /*
+ * If busiest rq is small, nr_running < SCHED_NR_MIGRATE_BREAK
+ * and all tasks are not movable, detach_tasks() should not
+ * iterate more than tasks available in rq.
+ */
+ env.loop_break = min(SCHED_NR_MIGRATE_BREAK, busiest->nr_running);
+
more_balance:
rq_lock_irqsave(busiest, &rf);
update_rq_clock(busiest);
@@ -11386,13 +11393,12 @@ static int load_balance(int this_cpu, struct rq *this_rq,
env.dst_cpu = env.new_dst_cpu;
env.flags &= ~LBF_DST_PINNED;
env.loop = 0;
- env.loop_break = SCHED_NR_MIGRATE_BREAK;
/*
* Go back to "more_balance" rather than "redo" since we
* need to continue with same src_cpu.
*/
- goto more_balance;
+ goto more_balance_reset_break;
}
/*
@@ -11418,7 +11424,6 @@ static int load_balance(int this_cpu, struct rq *this_rq,
*/
if (!cpumask_subset(cpus, env.dst_grpmask)) {
env.loop = 0;
- env.loop_break = SCHED_NR_MIGRATE_BREAK;
goto redo;
}
goto out_all_pinned;
--
2.39.3
Powered by blists - more mailing lists