linux-kernel - Re: [PATCH RFC] sched,numa: move tasks to preferred

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140516133805.GS11096@twins.programming.kicks-ass.net>
Date:	Fri, 16 May 2014 15:38:05 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Rik van Riel <riel@...hat.com>
Cc:	linux-kernel@...r.kernel.org, chegu_vinod@...com, mingo@...nel.org,
	umgwanakikbuti@...il.com
Subject: Re: [PATCH RFC] sched,numa: move tasks to preferred_node at wakeup
 time

On Fri, May 16, 2014 at 02:14:50AM -0400, Rik van Riel wrote:
> +#ifdef CONFIG_NUMA_BALANCING
> +static int numa_balance_on_wake(struct task_struct *p, int prev_cpu)
> +{
> +	long load, src_load, dst_load;

> +	int cur_node = cpu_to_node(prev_cpu);
> +	struct numa_group *numa_group = ACCESS_ONCE(p->numa_group);
> +	struct sched_domain *sd;
> +	struct task_numa_env env = {
> +		.p = p,
> +		.best_task = NULL,
> +		.best_imp = 0,
> +		.best_cpu = -1
> +	};

That's all code, ideally you'd move that after we're done checking the
reasons to not do work, say somehere like...

> +
> +	if (!sched_feat(NUMA))
> +		return prev_cpu;

Yah.. :-( I think some people changed that to numabalancing_enabled.

Fixing that is still on the todo list somewhere.

> +
> +	if (p->numa_preferred_nid == -1)
> +		return prev_cpu;
> +
> +	if (p->numa_preferred_nid == cur_node);
> +		return prev_cpu;
> +
> +	if (numa_group && node_isset(cur_node, numa_group->active_nodes))
> +		return prev_cpu;
> +
> +	sd = rcu_dereference(per_cpu(sd_numa, env.src_cpu));
> +	if (sd)
> +		env.imbalance_pct = 100 + (sd->imbalance_pct - 100) / 2;
> +
> +	/*
> +	 * Cpusets can break the scheduler domain tree into smaller
> +	 * balance domains, some of which do not cross NUMA boundaries.
> +	 * Tasks that are "trapped" in such domains cannot be migrated
> +	 * elsewhere, so there is no point in (re)trying.
> +	 */
> +	if (unlikely(!sd)) {

How about you bail early, and then have the above test evaporate?

> +		p->numa_preferred_nid = cur_node;
> +		return prev_cpu;
> +	}

.. here.

> +
> +	/*
> +	 * Only allow p to move back to its preferred nid if
> +	 * that does not create an imbalance that would cause
> +	 * the load balancer to move a task around later.
> +	 */
> +	env.src_nid = cur_node;
> +	env.dst_nid = p->numa_preferred_nid;
> +
> +	update_numa_stats(&env.src_stats, env.src_nid);
> +	update_numa_stats(&env.dst_stats, env.dst_nid);
> +
> +	dst_load = env.dst_stats.load;
> +	src_load = env.src_stats.load;
> +
> +	/* XXX missing power terms */
> +	load = task_h_load(p);
> +	dst_load += load;
> +	src_load -= load;
> +
> +	if (load_too_imbalanced(env.src_stats.load, env.dst_stats.load,
> +				src_load, dst_load, &env))
> +		return prev_cpu;

So I'm thinking that load_too_imbalanced() is from another patch I
haven't yet seen, lemme go see if you did send it and I missed it.

> +
> +	return cpumask_first(cpumask_of_node(p->numa_preferred_nid));
> +}

Content of type "application/pgp-signature" skipped