[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <383de0cc-ad84-4cc1-48d6-512e7d3ddaa8@arm.com>
Date: Thu, 19 Dec 2019 11:56:09 +0000
From: Valentin Schneider <valentin.schneider@....com>
To: Mel Gorman <mgorman@...hsingularity.net>
Cc: Vincent Guittot <vincent.guittot@...aro.org>,
Ingo Molnar <mingo@...nel.org>,
Peter Zijlstra <peterz@...radead.org>, pauld@...hat.com,
srikar@...ux.vnet.ibm.com, quentin.perret@....com,
dietmar.eggemann@....com, Morten.Rasmussen@....com,
hdanton@...a.com, parth@...ux.ibm.com, riel@...riel.com,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] sched, fair: Allow a small degree of load imbalance
between SD_NUMA domains
On 18/12/2019 22:50, Mel Gorman wrote:
>> I'm quite sure you have reasons to have written it that way, but I was
>> hoping we could squash it down to something like:
>
> I wrote it that way to make it clear exactly what has changed, the
> thinking behind the checks and to avoid 80-col limits to make review
> easier overall. It's a force of habit and I'm happy to reformat it as
> you suggest except....
>
I tend to disregard the 80 col limit, so I might not be the best example
here :D
>> ---
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 08a233e97a01..f05d09a8452e 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -8680,16 +8680,27 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
>> env->migration_type = migrate_task;
>> lsub_positive(&nr_diff, local->sum_nr_running);
>> env->imbalance = nr_diff >> 1;
>> - return;
>> + } else {
>> +
>> + /*
>> + * If there is no overload, we just want to even the number of
>> + * idle cpus.
>> + */
>> + env->migration_type = migrate_task;
>> + env->imbalance = max_t(long, 0, (local->idle_cpus -
>> + busiest->idle_cpus) >> 1);
>> }
>>
>> /*
>> - * If there is no overload, we just want to even the number of
>> - * idle cpus.
>> + * Allow for a small imbalance between NUMA groups; don't do any
>> + * of it if there is at least half as many tasks / busy CPUs as
>> + * there are available CPUs in the busiest group
>> */
>> - env->migration_type = migrate_task;
>> - env->imbalance = max_t(long, 0, (local->idle_cpus -
>> - busiest->idle_cpus) >> 1);
>> + if (env->sd->flags & SD_NUMA &&
>> + (busiest->sum_nr_running < busiest->group_weight >> 1) &&
>
> This last line is not exactly equivalent to what I wrote. It would need
> to be
>
> (busiest->sum_nr_running < (busiest->group_weight >> 1) - imbalance_adj) &&
>
Right, I was implicitly suggesting that maybe we could forgo the
imbalance_adj computation and just roll with the imbalance_pct (with perhaps
and extra shift here and there). IMO the important thing here is the
half-way cutoff.
> I can test as you suggest to see if it's roughly equivalent in terms of
> performance. The intent was to have a cutoff just before we reached 50%
> running tasks / busy CPUs.
>
I think that cutoff makes sense; it's also important that it isn't purely
busy CPU-based because we're not guaranteed to have 1 task per CPU (due to
affinity or else), so I think the "half as many tasks as available CPUs"
thing has some merit.
Powered by blists - more mailing lists