linux-kernel - Re: [PATCH v2 8/8] sched/fair: use utilization to select misfit task

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <22ba6771-8bde-8c6e-65e0-4ab2ebc6e018@arm.com>
Date:   Thu, 1 Aug 2019 17:27:50 +0100
From:   Valentin Schneider <valentin.schneider@....com>
To:     Vincent Guittot <vincent.guittot@...aro.org>,
        linux-kernel@...r.kernel.org, mingo@...hat.com,
        peterz@...radead.org
Cc:     pauld@...hat.com, srikar@...ux.vnet.ibm.com,
        quentin.perret@....com, dietmar.eggemann@....com,
        Morten.Rasmussen@....com
Subject: Re: [PATCH v2 8/8] sched/fair: use utilization to select misfit task

On 01/08/2019 15:40, Vincent Guittot wrote:
> @@ -8261,7 +8261,7 @@ static bool update_sd_pick_busiest(struct lb_env *env,
>  		 * If we have more than one misfit sg go with the
>  		 * biggest misfit.
>  		 */
> -		if (sgs->group_misfit_task_load < busiest->group_misfit_task_load)
> +		if (sgs->group_misfit_task_util < busiest->group_misfit_task_util)
>  			return false;

I previously said this change would render the maximization useless, but I
had forgotten one thing: with PELT time scaling, task utilization can go
above its CPU's capacity.

So if you have two LITTLE CPUs running a busy loop (misfit task) each, the
one that's been running the longest would have the highest utilization
(assuming they haven't reached 1024 yet). In other words "utilizations
above the capacity_margin can't be compared" doesn't really stand.

Still, maximizing load would lead us there. Furthermore, if we have to pick
between two rqs with misfit tasks, I still believe we should pick the one
with the highest load, not the highest utilization.

We could keep load and fix the issue of detaching the wrong task with
something like:

-----8<-----
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 53e64a7b2ae0..bfc2b624ee98 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7489,12 +7489,8 @@ static int detach_tasks(struct lb_env *env)
                case migrate_misfit:
                        load = task_h_load(p);
 
-                       /*
-                        * utilization of misfit task might decrease a bit
-                        * since it has been recorded. Be conservative in the
-                        * condition.
-                        */
-                       if (load < env->imbalance)
+                       /* This is not a misfit task */
+                       if (task_fits_capacity(p, capacity_of(env->src_cpu)))
                                goto next;
 
                        env->imbalance = 0;
----->8-----

However what would be *even* better IMO would be:

-----8<-----
@@ -8853,6 +8853,7 @@ voluntary_active_balance(struct lb_env *env)
                        return 1;
        }
 
+       /* XXX: make sure current is still a misfit task? */
        if (env->balance_type == migrate_misfit)
                return 1;
 
@@ -8966,6 +8967,20 @@ static int load_balance(int this_cpu, struct rq *this_rq,
        env.src_rq = busiest;
 
        ld_moved = 0;
+
+       /*
+        * Misfit tasks are only misfit if they are currently running, see
+        * update_misfit_status().
+        *
+        * - If they're not running, we'll get an opportunity at wakeup to
+        *   migrate them to a bigger CPU.
+        * - If they're running, we need to active balance them to a bigger CPU.
+        *
+        * Do the smart thing and go straight for active balance.
+        */
+       if (env->balance_type == migrate_misfit)
+               goto active_balance;
+
        if (busiest->nr_running > 1) {
                /*
                 * Attempt to move tasks. If find_busiest_group has found
@@ -9074,7 +9089,7 @@ static int load_balance(int this_cpu, struct rq *this_rq,
                        goto out_all_pinned;
                }
        }
-
+active_balance:
        if (!ld_moved) {
                schedstat_inc(sd->lb_failed[idle]);
                /*
----->8-----