linux-kernel - Re: [PATCH v2 3/3] sched/fair: fix unnecessary increase of balance interval

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <f6443ca2-3e82-d4eb-deb8-2aff44c618e2@arm.com>
Date:   Mon, 17 Dec 2018 16:56:18 +0000
From:   Valentin Schneider <valentin.schneider@....com>
To:     Vincent Guittot <vincent.guittot@...aro.org>, peterz@...radead.org,
        mingo@...nel.org, linux-kernel@...r.kernel.org
Cc:     Morten.Rasmussen@....com
Subject: Re: [PATCH v2 3/3] sched/fair: fix unnecessary increase of balance
 interval

Hi Vincent,

About time I had a look at this one...

On 14/12/2018 16:01, Vincent Guittot wrote:
> In case of active balance, we increase the balance interval to cover
> pinned tasks cases not covered by all_pinned logic.

AFAIUI the balance increase is there to have plenty of time to
stop the task before going through another load_balance().

Seeing as there is a cpus_allowed check that leads to

    env.flags |= LBF_ALL_PINNED;
    goto out_one_pinned;

in the active balancing part of load_balance(), the condition you're
changing should never be hit when we have pinned tasks. So you may want
to rephrase that bit.

> Neverthless, the active
> migration triggered by asym packing should be treated as the normal
> unbalanced case and reset the interval to default value otherwise active
> migration for asym_packing can be easily delayed for hundreds of ms
> because of this all_pinned detection mecanism.

Mmm so it's not exactly clear why we need this change. If we double the
interval because of a pinned task we wanted to active balance, well it's
just regular pinned task issues and we can't do much about it.

The only scenario I can think of is if you had a workload where you wanted
to do an active balance in several successive load_balance(), in which case
you would keep increasing the interval even though you do migrate a task
every time (which would harm every subsequent active_balance).

In that case every active_balance "user" (pressured CPU, misfit) is
affected, so maybe what we want is something like this:

-----8<-----
@@ -9136,13 +9149,11 @@ static int load_balance(int this_cpu, struct rq *this_rq,
                sd->balance_interval = sd->min_interval;
        } else {
                /*
-                * If we've begun active balancing, start to back off. This
-                * case may not be covered by the all_pinned logic if there
-                * is only 1 task on the busy runqueue (because we don't call
-                * detach_tasks).
+                * If we've begun active balancing, start to back off.
+                * Don't increase too much in case we have more active balances
+                * coming up.
                 */
-               if (sd->balance_interval < sd->max_interval)
-                       sd->balance_interval *= 2;
+               sd->balance_interval = 2 * sd->min_interval;
        }
 
        goto out;
----->8-----

Maybe we want a larger threshold - truth be told, it all depends on how
long the cpu stopper can take and if that delay increase is still relevant
nowadays.

> 
> Signed-off-by: Vincent Guittot <vincent.guittot@...aro.org>
> ---
>  kernel/sched/fair.c | 27 +++++++++++++++------------
>  1 file changed, 15 insertions(+), 12 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 9591e7a..487287e 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8857,21 +8857,24 @@ static struct rq *find_busiest_queue(struct lb_env *env,
>   */
>  #define MAX_PINNED_INTERVAL	512
>  
> +static inline bool
> +asym_active_balance(struct lb_env *env)
> +{
> +	/*
> +	 * ASYM_PACKING needs to force migrate tasks from busy but
> +	 * lower priority CPUs in order to pack all tasks in the
> +	 * highest priority CPUs.
> +	 */
> +	return env->idle != CPU_NOT_IDLE && (env->sd->flags & SD_ASYM_PACKING) &&
> +	       sched_asym_prefer(env->dst_cpu, env->src_cpu);
> +}
> +
>  static int need_active_balance(struct lb_env *env)
>  {
>  	struct sched_domain *sd = env->sd;
>  
> -	if (env->idle != CPU_NOT_IDLE) {
> -
> -		/*
> -		 * ASYM_PACKING needs to force migrate tasks from busy but
> -		 * lower priority CPUs in order to pack all tasks in the
> -		 * highest priority CPUs.
> -		 */
> -		if ((sd->flags & SD_ASYM_PACKING) &&
> -		    sched_asym_prefer(env->dst_cpu, env->src_cpu))
> -			return 1;
> -	}
> +	if (asym_active_balance(env))
> +		return 1;
>  
>  	/*
>  	 * The dst_cpu is idle and the src_cpu CPU has only 1 CFS task.
> @@ -9150,7 +9153,7 @@ static int load_balance(int this_cpu, struct rq *this_rq,
>  	} else
>  		sd->nr_balance_failed = 0;
>  
> -	if (likely(!active_balance)) {
> +	if (likely(!active_balance) || asym_active_balance(&env)) {

AFAICT this makes us reset the interval whenever we do an asym packing
active balance (if the task is pinned we would goto out_one_pinned).
This goes against the logic I had understood so far and that I explained
higher up.

>  		/* We were unbalanced, so reset the balancing interval */
>  		sd->balance_interval = sd->min_interval;
>  	} else {
>