linux-kernel - Re: [PATCH v3 4/5] sched/pelt: Add a new runnable average signal

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <9fe822fc-c311-2b97-ae14-b9269dd99f1e@arm.com>
Date:   Wed, 19 Feb 2020 20:10:38 +0000
From:   Valentin Schneider <valentin.schneider@....com>
To:     Vincent Guittot <vincent.guittot@...aro.org>, mingo@...hat.com,
        peterz@...radead.org, juri.lelli@...hat.com,
        dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
        mgorman@...e.de, linux-kernel@...r.kernel.org
Cc:     pauld@...hat.com, parth@...ux.ibm.com, hdanton@...a.com
Subject: Re: [PATCH v3 4/5] sched/pelt: Add a new runnable average signal

On 19/02/2020 12:55, Vincent Guittot wrote:
> @@ -740,8 +740,10 @@ void init_entity_runnable_average(struct sched_entity *se)
>  	 * Group entities are initialized with zero load to reflect the fact that
>  	 * nothing has been attached to the task group yet.
>  	 */
> -	if (entity_is_task(se))
> +	if (entity_is_task(se)) {
> +		sa->runnable_avg = SCHED_CAPACITY_SCALE;

So this is a comment that's more related to patch 5, but the relevant bit is
here. I'm thinking this initialization might be too aggressive wrt load
balance. This will also give different results between symmetric vs
asymmetric topologies - a single fork() will make a LITTLE CPU group (at the
base domain level) overloaded straight away. That won't happen for bigs or on
symmetric topologies because

  // group_is_overloaded()
  sgs->group_capacity * imbalance_pct) < (sgs->group_runnable * 100)

will be false - it would take more than one task for that to happen (due to
the imbalance_pct).

So maybe what we want here instead is to mimic what he have for utilization,
i.e. initialize to half the spare capacity of the local CPU. IOW, 
conceptually something like this:

---
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 99249a2484b4..762717092235 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -740,10 +740,8 @@ void init_entity_runnable_average(struct sched_entity *se)
 	 * Group entities are initialized with zero load to reflect the fact that
 	 * nothing has been attached to the task group yet.
 	 */
-	if (entity_is_task(se)) {
-		sa->runnable_avg = SCHED_CAPACITY_SCALE;
+	if (entity_is_task(se))
 		sa->load_avg = scale_load_down(se->load.weight);
-	}
 
 	/* when this task enqueue'ed, it will contribute to its cfs_rq's load_avg */
 }
@@ -796,6 +794,8 @@ void post_init_entity_util_avg(struct task_struct *p)
 		}
 	}
 
+	sa->runnable_avg = sa->util_avg;
+
 	if (p->sched_class != &fair_sched_class) {
 		/*
 		 * For !fair tasks do:
---

The current approach has the merit of giving some sort of hint to the LB
that there is a bunch of new tasks that it could spread out, but I fear it
is too aggressive.

>  		sa->load_avg = scale_load_down(se->load.weight);
> +	}
>  
>  	/* when this task enqueue'ed, it will contribute to its cfs_rq's load_avg */
>  }