[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <52846863.4060001@linux.vnet.ibm.com>
Date: Thu, 14 Nov 2013 11:36:27 +0530
From: Preeti U Murthy <preeti@...ux.vnet.ibm.com>
To: Srikar Dronamraju <srikar@...ux.vnet.ibm.com>,
Peter Zijlstra <peterz@...radead.org>
CC: Preeti Murthy <preeti.lkml@...il.com>, mingo@...nel.org,
hpa@...or.com, linux-kernel@...r.kernel.org,
Thomas Gleixner <tglx@...utronix.de>, mikey@...ling.org,
linux-tip-commits@...r.kernel.org
Subject: Re: [PATCH v2] sched: Check sched_domain before computing group power.
Hi,
On 11/13/2013 04:53 PM, Srikar Dronamraju wrote:
> * Preeti Murthy <preeti.lkml@...il.com> [2013-11-13 16:22:37]:
>
>> Hi Srikar,
>>
>> update_group_power() is called only during load balancing during
>> update_sg_lb_stats().
>> Load balancing begins at the base domain of the CPU,rq(cpu)->sd. This is
>> checked for
>> NULL. So how can update_group_power() be called in a scenario where the
>> base domain
>> of the CPU is not initialized? I say 'initialized' since you check for NULL
>> on rq(cpu)->sd.
>>
>
> update_group_power() also gets called from init_sched_groups_power().
> And if you see the oops message, we know that the oops happens from that
> path. In build_sched_domains(), we do cpu_attach_domain() what updates
> rq->sd after the call to init_sched_groups_power(). So by the time
> init_sched_groups_power() is called rq->sd is not yet initialized.
>
> We only hit oops case, when the sd->flags has SD_OVERLAP set.
>
>
>
>> In the changelog, you say 'updated'. Are you saying that it has a stale
>> value?
>
> I said, "before the sched_domain for a cpu is updated", so its not yet
> updated or has stale value. As I said earlier in this mail, the
> initialization happens after we do a update_group_power().
>
>> Please do elaborate on how you observed this.
>>
>
> Does this clarify?
Yes it clarifies, thank you.
However I was thinking that a better fix would be to reorder the way we call
update_group_power() and cpu_attach_domain(). Why do we need to do
update_group_power() of the groups of the sched domains that would probably
degenerate in cpu_attach_domain()? So it seemed best to move update_group_power()
to after cpu_attach_domain() so that it saves unnecessary iterations over
sched domains which could degenerate, and it fixes the issue that you have brought out
as well. See below for the patch:
-------------------------------------------------------------------------------
sched: Update power of sched groups after sched domains have been attached to CPUs
From: Preeti U Murthy <preeti@...ux.vnet.ibm.com>
Avoid iterating unnecessarily over the sched domains which could potentially
degenerate, while updating sched groups' power. This can be done by moving
the call to init_sched_groups_power() to after cpu_attach_domain(), when the
possibility of degenerating sched domains is examined and appropriately sched
domains are degenerated.
But claim_allocations() which does a NULL on the struct sd_data members for
each sched domain should iterate over all the initally built sched domains.
So move claim_allocations() to a loop where we build sched groups for each domain.
We would not require to reference sd_data after sched domains and sched
groups have been built.
Another use of this re-ordering is with reference to the commit
"sched/fair: Fix group power_orig computation". After this commit,
we end up dereferencing cpu_rq(cpu)->sd in update_group_power(). This would
lead to a NULL pointer since this parameter is updated after call to
update_group_power() in build_sched_domains() during initialization of sched
domains per CPU. The below change prevents this from occuring.
Signed-off-by: Preeti U. Murthy <preeti@...ux.vnet.ibm.com>
---
kernel/sched/core.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e6a6244..d9703ac 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6211,17 +6211,7 @@ static int build_sched_domains(const struct cpumask *cpu_map,
if (build_sched_groups(sd, i))
goto error;
}
- }
- }
-
- /* Calculate CPU power for physical packages and nodes */
- for (i = nr_cpumask_bits-1; i >= 0; i--) {
- if (!cpumask_test_cpu(i, cpu_map))
- continue;
-
- for (sd = *per_cpu_ptr(d.sd, i); sd; sd = sd->parent) {
claim_allocations(i, sd);
- init_sched_groups_power(i, sd);
}
}
@@ -6233,6 +6223,16 @@ static int build_sched_domains(const struct cpumask *cpu_map,
}
rcu_read_unlock();
+ /* Calculate CPU power for physical packages and nodes */
+ for (i = nr_cpumask_bits-1; i >= 0; i--) {
+ if (!cpumask_test_cpu(i, cpu_map))
+ continue;
+
+ for (sd = *per_cpu_ptr(d.sd, i); sd; sd = sd->parent)
+ init_sched_groups_power(i, sd);
+ }
+
+
ret = 0;
error:
__free_domain_allocs(&d, alloc_state, cpu_map);
>
Regards
Preeti U. Murthy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists