linux-kernel - Re: [PATCH v2] sched: Check sched_domain before computing group power.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <52846863.4060001@linux.vnet.ibm.com>
Date:	Thu, 14 Nov 2013 11:36:27 +0530
From:	Preeti U Murthy <preeti@...ux.vnet.ibm.com>
To:	Srikar Dronamraju <srikar@...ux.vnet.ibm.com>,
	Peter Zijlstra <peterz@...radead.org>
CC:	Preeti Murthy <preeti.lkml@...il.com>, mingo@...nel.org,
	hpa@...or.com, linux-kernel@...r.kernel.org,
	Thomas Gleixner <tglx@...utronix.de>, mikey@...ling.org,
	linux-tip-commits@...r.kernel.org
Subject: Re: [PATCH v2] sched: Check sched_domain before computing group power.

Hi,

On 11/13/2013 04:53 PM, Srikar Dronamraju wrote:
> * Preeti Murthy <preeti.lkml@...il.com> [2013-11-13 16:22:37]:
> 
>> Hi Srikar,
>>
>> update_group_power() is called only during load balancing during
>> update_sg_lb_stats().
>> Load balancing begins at the base domain of the CPU,rq(cpu)->sd. This is
>> checked for
>> NULL. So how can update_group_power() be called in a scenario where the
>> base domain
>> of the CPU is not initialized? I say 'initialized' since you check for NULL
>> on rq(cpu)->sd.
>>
> 
> update_group_power() also gets called from init_sched_groups_power().
> And if you see the oops message, we know that the oops happens from that
> path. In build_sched_domains(), we do cpu_attach_domain() what updates
> rq->sd after the call to init_sched_groups_power(). So by the time
> init_sched_groups_power() is called rq->sd is not yet initialized. 
> 
> We only hit oops case, when the sd->flags has SD_OVERLAP set.
> 
> 
> 
>> In the changelog, you say 'updated'. Are you saying that it has a stale
>> value?
> 
> I said, "before the sched_domain for a cpu is updated", so its not yet
> updated or has stale value. As I said earlier in this mail, the
> initialization happens after we do a update_group_power().
> 
>> Please do elaborate on how you observed this.
>>
> 
> Does this clarify?

Yes it clarifies, thank you.

However I was thinking that a better fix would be to reorder the way we call
update_group_power() and cpu_attach_domain(). Why do we need to do
update_group_power() of the groups of the sched domains that would probably
degenerate in cpu_attach_domain()? So it seemed best to move update_group_power()
to after cpu_attach_domain() so that it saves unnecessary iterations over
sched domains which could degenerate, and it fixes the issue that you have brought out
as well. See below for the patch:

-------------------------------------------------------------------------------

sched: Update power of sched groups after sched domains have been attached to CPUs

From: Preeti U Murthy <preeti@...ux.vnet.ibm.com>

Avoid iterating unnecessarily over the sched domains which could potentially
degenerate,  while updating sched groups' power. This can be done by moving
the call to init_sched_groups_power() to after cpu_attach_domain(), when the
possibility of degenerating sched domains is examined and appropriately sched
domains are degenerated.

But claim_allocations() which does a NULL on the struct sd_data members for
each sched domain should iterate over all the initally built sched domains.
So move claim_allocations() to a loop where we build sched groups for each domain.
We would not require to reference sd_data after sched domains and sched
groups have been built.

Another use of this re-ordering is with reference to the commit
"sched/fair: Fix group power_orig computation". After this commit,
we end up dereferencing cpu_rq(cpu)->sd in update_group_power(). This would
lead to a NULL pointer since this parameter is updated after call to
update_group_power() in build_sched_domains() during initialization of sched
domains per CPU. The below change prevents this from occuring.

Signed-off-by: Preeti U. Murthy <preeti@...ux.vnet.ibm.com>
---
 kernel/sched/core.c |   20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e6a6244..d9703ac 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6211,17 +6211,7 @@ static int build_sched_domains(const struct cpumask *cpu_map,
 				if (build_sched_groups(sd, i))
 					goto error;
 			}
-		}
-	}
-
-	/* Calculate CPU power for physical packages and nodes */
-	for (i = nr_cpumask_bits-1; i >= 0; i--) {
-		if (!cpumask_test_cpu(i, cpu_map))
-			continue;
-
-		for (sd = *per_cpu_ptr(d.sd, i); sd; sd = sd->parent) {
 			claim_allocations(i, sd);
-			init_sched_groups_power(i, sd);
 		}
 	}

@@ -6233,6 +6223,16 @@ static int build_sched_domains(const struct cpumask *cpu_map,
 	}
 	rcu_read_unlock();

+	/* Calculate CPU power for physical packages and nodes */
+	for (i = nr_cpumask_bits-1; i >= 0; i--) {
+		if (!cpumask_test_cpu(i, cpu_map))
+			continue;
+
+		for (sd = *per_cpu_ptr(d.sd, i); sd; sd = sd->parent)
+			init_sched_groups_power(i, sd);
+	}
+
+
 	ret = 0;
 error:
 	__free_domain_allocs(&d, alloc_state, cpu_map);

> 
Regards
Preeti U. Murthy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/