[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9b97fa6f-2b3d-3e10-fd55-208ef47f3e2a@arm.com>
Date: Mon, 15 May 2017 15:56:20 +0100
From: Dietmar Eggemann <dietmar.eggemann@....com>
To: Jeffrey Hugo <jhugo@...eaurora.org>,
Peter Zijlstra <peterz@...radead.org>
Cc: Ingo Molnar <mingo@...hat.com>, linux-kernel@...r.kernel.org,
Austin Christ <austinwc@...eaurora.org>,
Tyler Baicar <tbaicar@...eaurora.org>
Subject: Re: [RFC 1/2] sched/fair: Fix load_balance() affinity redo path
On 12/05/17 21:57, Jeffrey Hugo wrote:
> On 5/12/2017 2:47 PM, Peter Zijlstra wrote:
>> On Fri, May 12, 2017 at 11:01:37AM -0600, Jeffrey Hugo wrote:
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index d711093..8f783ba 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -8219,8 +8219,19 @@ static int load_balance(int this_cpu, struct
>>> rq *this_rq,
>>> /* All tasks on this runqueue were pinned by CPU affinity */
>>> if (unlikely(env.flags & LBF_ALL_PINNED)) {
>>> + struct cpumask tmp;
>>
>> You cannot have cpumask's on stack.
>
> Well, we need a temp variable to store the intermediate values since the
> cpumask_* operations are somewhat limited, and require a "storage"
> parameter.
>
> Do you have any suggestions to meet all of these requirements?
What about we use env.dst_grpmask and check if cpus is an improper
subset of env.dst_grpmask? In this case we have to get rid of
setting env.dst_grpmask = NULL in case of CPU_NEWLY_IDLE which is
IMHO not an issue since it's idle is passed via env into
can_migrate_task().
And cpus has to be and'ed with sched_domain_span(env.sd).
I'm not sure if this will work with 'not fully connected NUMA' (SD_OVERLAP)
though ...
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a903276fcb62..2ede4c1c9db8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6737,10 +6737,10 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
* our sched_group. We may want to revisit it if we couldn't
* meet load balance goals by pulling other tasks on src_cpu.
*
- * Also avoid computing new_dst_cpu if we have already computed
- * one in current iteration.
+ * Avoid computing new_dst_cpu for NEWLY_IDLE or if we have
+ * already computed one in current iteration.
*/
- if (!env->dst_grpmask || (env->flags & LBF_DST_PINNED))
+ if (env->idle == CPU_NEWLY_IDLE || (env->flags & LBF_DST_PINNED))
return 0;
/* Prevent to re-select dst_cpu via env's cpus */
@@ -8091,14 +8091,7 @@ static int load_balance(int this_cpu, struct rq *this_rq,
.tasks = LIST_HEAD_INIT(env.tasks),
};
- /*
- * For NEWLY_IDLE load_balancing, we don't need to consider
- * other cpus in our group
- */
- if (idle == CPU_NEWLY_IDLE)
- env.dst_grpmask = NULL;
-
- cpumask_copy(cpus, cpu_active_mask);
+ cpumask_and(cpus, cpu_active_mask, sched_domain_span(env.sd));
schedstat_inc(sd->lb_count[idle]);
@@ -8220,7 +8213,7 @@ static int load_balance(int this_cpu, struct rq *this_rq,
/* All tasks on this runqueue were pinned by CPU affinity */
if (unlikely(env.flags & LBF_ALL_PINNED)) {
cpumask_clear_cpu(cpu_of(busiest), cpus);
- if (!cpumask_empty(cpus)) {
+ if (!cpumask_subset(cpus, env.dst_grpmask)) {
env.loop = 0;
env.loop_break = sched_nr_migrate_break;
goto redo;
Powered by blists - more mailing lists