[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5568D43D.20703@fb.com>
Date: Fri, 29 May 2015 17:03:57 -0400
From: Josef Bacik <jbacik@...com>
To: Peter Zijlstra <peterz@...radead.org>
CC: <riel@...hat.com>, <mingo@...hat.com>,
<linux-kernel@...r.kernel.org>, <umgwanakikbuti@...il.com>,
<morten.rasmussen@....com>, kernel-team <Kernel-team@...com>
Subject: Re: [PATCH RESEND] sched: prefer an idle cpu vs an idle sibling for
BALANCE_WAKE
On 05/28/2015 07:05 AM, Peter Zijlstra wrote:
>
> So maybe you want something like the below; that cures the thing Morten
> raised, and we continue looking for sd, even after we found affine_sd.
>
> It also avoids the pointless idle_cpu() check Mike raised by making
> select_idle_sibling() return -1 if it doesn't find anything.
>
> Then it continues doing the full balance IFF sd was set, which is keyed
> off of sd->flags.
>
> And note (as Mike already said), BALANCE_WAKE does _NOT_ look for idle
> CPUs, it looks for the least loaded CPU. And its damn expensive.
>
>
> Rewriting this entire thing is somewhere on the todo list :/
>
Summarizing what I've found so far.
-We turn on SD_BALANCE_WAKE on our domains for our 3.10 boxes, but not
for our 4.0 boxes (due to some weird configuration issue.)
-Running with this patch is better than plain 4.0 but not as good as my
patch, running with SD_BALANCE_WAKE set and not set makes no difference
to the runs.
-I took out the sd = NULL; bit from the affine case like you said on IRC
and I get similar results as before.
-I'm thoroughly confused as to why my patch did anything since we
weren't turning on SD_BALANCE_WAKE on 4.0 in my previous runs (I assume,
it isn't set now so I'm pretty sure the problem has always been there)
so we should have always had sd == NULL which means we would have only
ever gotten the task cpu I guess.
Now I'm looking at the code in select_idle_sibling and we do this
for_each_lower_domain(sd) {
sg = sd->groups;
do {
if (!cpumask_intersects(sched_group_cpus(sg),
tsk_cpus_allowed(p)))
goto next;
for_each_cpu(i, sched_group_cpus(sg)) {
if (i == target || !idle_cpu(i))
goto next;
}
return cpumask_first_and(sched_group_cpus(sg),
tsk_cpus_allowed(p));
next:
sg = sg->next
} while (sg != sd->groups);
}
We get all the schedule groups for the schedule domain and if any of the
cpu's are not idle or the target then we skip the whole scheduling
group. Isn't the scheduling group a group of CPU's? Why can't we pick
an idle CPU in the group that has a none idle cpu or the target cpu?
Thanks,
Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists