linux-kernel - Re: [PATCH v2] sched/fair: Introduce priority load balance to reduce interference from IDLE tasks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20220818083133.GA536@vingu-book>
Date:   Thu, 18 Aug 2022 10:31:33 +0200
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Abel Wu <wuyun.abel@...edance.com>
Cc:     "zhangsong (J)" <zhangsong34@...wei.com>, mingo@...hat.com,
        peterz@...radead.org, juri.lelli@...hat.com,
        dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
        mgorman@...e.de, bristot@...hat.com, vschneid@...hat.com,
        linux-kernel@...r.kernel.org, kernel test robot <lkp@...el.com>
Subject: Re: [PATCH v2] sched/fair: Introduce priority load balance to reduce
 interference from IDLE tasks

Le jeudi 18 août 2022 à 10:46:55 (+0800), Abel Wu a écrit :
> On 8/17/22 8:58 PM, Vincent Guittot Wrote:
> > On Tue, 16 Aug 2022 at 04:53, zhangsong (J) <zhangsong34@...wei.com> wrote:
> > > 
> > > 

...

> > > Yes, this is usually a corner case, but suppose that some non-idle tasks bounds to CPU 1-2
> > > 
> > > and idle tasks bounds to CPU 0-1, so CPU 1 may has many idle tasks and some non-idle
> > > 
> > > tasks while idle tasks on CPU 1 can not be pulled to CPU 2, when trigger load balance if
> > > 
> > > CPU 2 should pull some tasks from CPU 1, the bad result is idle tasks of CPU 1 cannot be
> > > 
> > > migrated and non-idle tasks also cannot be migrated in case of env->loop_max constraint.
> > 
> > env->loop_max adds a break but load_balance will continue with next
> > tasks so it also tries to pull your non idle task at the end after
> > several breaks.
> 
> Loop will be terminated without LBF_NEED_BREAK if exceeds loop_max :)

Argh yes, my brain is not yet back from vacation
I have been confused by loop_max and loop_break being set to the same value 32

Zhang Song, Could you try the patch below ? If it works, I will prepare a
clean patch with all tags



sched/fair: make sure to try to detach at least one movable task

During load balance we try at most env->loop_max time to move a task. But
it can happen that the LRU tasks (ie tail of the cfs_tasks list) can't
be moved to dst_cpu because of affinity. In this case, loop in the list
until we found at least one.

Signed-off-by: Vincent Guittot <vincent.guittot@...aro.org>
---
 kernel/sched/fair.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index da388657d5ac..02b7b808e186 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8052,8 +8052,12 @@ static int detach_tasks(struct lb_env *env)
 		p = list_last_entry(tasks, struct task_struct, se.group_node);

 		env->loop++;
-		/* We've more or less seen every task there is, call it quits */
-		if (env->loop > env->loop_max)
+		/*
+		 * We've more or less seen every task there is, call it quits
+		 * unless we haven't found any movable task yet.
+		 */
+		if (env->loop > env->loop_max &&
+		    !(env->flags & LBF_ALL_PINNED))
 			break;

 		/* take a breather every nr_migrate tasks */
@@ -10182,7 +10186,9 @@ static int load_balance(int this_cpu, struct rq *this_rq,

 		if (env.flags & LBF_NEED_BREAK) {
 			env.flags &= ~LBF_NEED_BREAK;
-			goto more_balance;
+			/* Stop if we tried all running tasks */
+			if (env.loop < busiest->nr_running)
+				goto more_balance;
 		}

 		/*
--
2.17.1

> 
> > 
> > > 
> > > This will cause non-idle  tasks cannot achieve  more CPU utilization.
> > 
> > Your problem is not linked to IDLE vs NORMAL tasks but to the large
> > number of pinned tasks that can't migrate on CPU2. You can end with
> > the same behavior without using IDLE tasks but only NORMAL tasks.
> 
> I feel the same thing.
> 
> Best,
> Abel