lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20171002124348.wggjjboetajg6uwe@hirez.programming.kicks-ass.net>
Date:   Mon, 2 Oct 2017 14:43:48 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Suravee Suthikulpanit <Suravee.Suthikulpanit@....com>
Cc:     linux-kernel@...r.kernel.org, mingo@...hat.com, bp@...e.de
Subject: Re: [RFC PATCH] sched/topology: Introduce NUMA identity node sched
 domain


Sorry, lost track of this thing..

On Mon, Aug 14, 2017 at 02:44:59PM +0700, Suravee Suthikulpanit wrote:

> > Why does the degenerate code fail to remove things?
> > 
> 
> Sorry for confusion. Actually, the degenerate code does remove the duplicate
> NODE sched-domain.
> 
> The difference is, without !cpumask_equal(), now the MC sched-domain
> would have the SD_PREFER_SIBLING flag set by the degenerate code since
> the flag got transferred down from the NODE to MC sched-domain. Would
> this be the preferred behavior for MC sched-domain?

Right, but the 'normal' x86_topology setup will have SD_PREFER_SIBLING
set on the MC domain by virtue of merging the MC and CPU domains. So the
first layer of NUMA spreads.

Now, that appears not be the case for x86_numa_in_package_topology,
since that doesn't do the CPU domain, but your patch would effectively
'fix' that if it were to not have that conditional. But it would keep
the PREFER_SIBLING at the NODE level, MC would not get it.

But I'm wondering if we're not doing this all a bit backwards. Having
SD_PREFER_SIBLING on the L3 makes sense in that it will spread workload
across the machine and maximize cache utilization. But we don't set it
directly, instead we inherit it through merging.

So commit 8e7fbcbc22c1 ("sched: Remove stale power aware scheduling
remnants and dysfunctional knobs") made a fail by not preserving the
SD_PREFER_SIBLING for the !power_saving case on both CPU and MC.

Then commit 6956dc568f34 ("sched/numa: Add SD_PERFER_SIBLING to CPU
domain") adds it back to the CPU but not MC.

So I'm thinking it makes sense restore the explicit MC one and make your
NODE domain unconditional.

That would make behaviour consistent between the various modes;
something like the below. I'll go try and update your Changelog so we
can get this merged.

--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1157,6 +1157,7 @@ sd_init(struct sched_domain_topology_lev
 		sd->smt_gain = 1178; /* ~15% */
 
 	} else if (sd->flags & SD_SHARE_PKG_RESOURCES) {
+		sd->flags |= SD_PREFER_SIBLING;
 		sd->imbalance_pct = 117;
 		sd->cache_nice_tries = 1;
 		sd->busy_idx = 2;
@@ -1331,6 +1332,10 @@ void sched_init_numa(void)
 	if (!sched_domains_numa_distance)
 		return;
 
+	/* Includes NUMA identity node at level 0. */
+	sched_domains_numa_distance[level++] = curr_distance;
+	sched_domains_numa_levels = level;
+
 	/*
 	 * O(nr_nodes^2) deduplicating selection sort -- in order to find the
 	 * unique distances in the node_distance() table.
@@ -1378,8 +1383,7 @@ void sched_init_numa(void)
 		return;
 
 	/*
-	 * 'level' contains the number of unique distances, excluding the
-	 * identity distance node_distance(i,i).
+	 * 'level' contains the number of unique distances
 	 *
 	 * The sched_domains_numa_distance[] array includes the actual distance
 	 * numbers.
@@ -1441,9 +1445,18 @@ void sched_init_numa(void)
 		tl[i] = sched_domain_topology[i];
 
 	/*
+	 * Add the NUMA identity distance, aka single NODE.
+	 */
+	tl[i++] = (struct sched_domain_topology_level){
+		.mask = sd_numa_mask,
+		.numa_level = 0,
+		SD_INIT_NAME(NODE)
+	};
+
+	/*
 	 * .. and append 'j' levels of NUMA goodness.
 	 */
-	for (j = 0; j < level; i++, j++) {
+	for (j = 1; j < level; i++, j++) {
 		tl[i] = (struct sched_domain_topology_level){
 			.mask = sd_numa_mask,
 			.sd_flags = cpu_numa_flags,

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ