linux-kernel - Re: [RFC][PATCH 15/26] sched, numa: Implement hotplug hooks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4F672384.7030500@linux.vnet.ibm.com>
Date:	Mon, 19 Mar 2012 17:46:04 +0530
From:	"Srivatsa S. Bhat" <srivatsa.bhat@...ux.vnet.ibm.com>
To:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
CC:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...e.hu>, Paul Turner <pjt@...gle.com>,
	Suresh Siddha <suresh.b.siddha@...el.com>,
	Mike Galbraith <efault@....de>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Lai Jiangshan <laijs@...fujitsu.com>,
	Dan Smith <danms@...ibm.com>,
	Bharata B Rao <bharata.rao@...il.com>,
	Lee Schermerhorn <Lee.Schermerhorn@...com>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Rik van Riel <riel@...hat.com>,
	Johannes Weiner <hannes@...xchg.org>,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [RFC][PATCH 15/26] sched, numa: Implement hotplug hooks

On 03/16/2012 08:10 PM, Peter Zijlstra wrote:

> start/stop numa balance threads on-demand using cpu-hotlpug.
> 
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@...llo.nl>
> ---
>  kernel/sched/numa.c |   62 ++++++++++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 55 insertions(+), 7 deletions(-)
> --- a/kernel/sched/numa.c
> +++ b/kernel/sched/numa.c
> @@ -596,31 +596,79 @@ static int numad_thread(void *data)
>  	return 0;
>  }
> 
> +static int __cpuinit
> +numa_hotplug(struct notifier_block *nb, unsigned long action, void *hcpu)
> +{
> +	int cpu = (long)hcpu;
> +	int node = cpu_to_node(cpu);
> +	struct node_queue *nq = nq_of(node);
> +	struct task_struct *numad;
> +	int err = 0;
> +
> +	switch (action & ~CPU_TASKS_FROZEN) {
> +	case CPU_UP_PREPARE:
> +		if (nq->numad)
> +			break;
> +
> +		numad = kthread_create_on_node(numad_thread,
> +				nq, node, "numad/%d", node);
> +		if (IS_ERR(numad)) {
> +			err = PTR_ERR(numad);
> +			break;
> +		}
> +
> +		nq->numad = numad;
> +		nq->next_schedule = jiffies + HZ; // XXX sync-up?
> +		break;
> +
> +	case CPU_ONLINE:
> +		wake_up_process(nq->numad);
> +		break;
> +
> +	case CPU_DEAD:
> +	case CPU_UP_CANCELED:
> +		if (!nq->numad)
> +			break;
> +
> +		if (cpumask_any_and(cpu_online_mask,
> +				    cpumask_of_node(node)) >= nr_cpu_ids) {
> +			kthread_stop(nq->numad);
> +			nq->numad = NULL;
> +		}
> +		break;
> +	}
> +
> +	return notifier_from_errno(err);
> +}
> +
>  static __init int numa_init(void)
>  {
> -	int node;
> +	int node, cpu, err;
> 
>  	nqs = kzalloc(sizeof(struct node_queue*) * nr_node_ids, GFP_KERNEL);
>  	BUG_ON(!nqs);
> 
> -	for_each_node(node) { // XXX hotplug
> +	for_each_node(node) {
>  		struct node_queue *nq = kmalloc_node(sizeof(*nq),
>  				GFP_KERNEL | __GFP_ZERO, node);
>  		BUG_ON(!nq);
> 
> -		nq->numad = kthread_create_on_node(numad_thread,
> -				nq, node, "numad/%d", node);
> -		BUG_ON(IS_ERR(nq->numad));
> -
>  		spin_lock_init(&nq->lock);
>  		INIT_LIST_HEAD(&nq->entity_list);
> 
>  		nq->next_schedule = jiffies + HZ;
>  		nq->node = node;
>  		nqs[node] = nq;
> +	}
> 
> -		wake_up_process(nq->numad);
> +	get_online_cpus();
> +	cpu_notifier(numa_hotplug, 0);


ABBA deadlock!

CPU 0						CPU1
				echo 0/1 > /sys/devices/.../cpu*/online

					acquire cpu_add_remove_lock

get_online_cpus()
	acquire cpu_hotplug lock
					
					Blocked on cpu hotplug lock

cpu_notifier()
	acquire cpu_add_remove_lock

ABBA DEADLOCK!

[cpu_maps_update_begin/done() deal with cpu_add_remove_lock].

So, basically, at the moment there is no way to register a CPU Hotplug notifier
and do setup for all currently online cpus in a totally race-free manner.

One approach to fix this is to audit whether register_cpu_notifier() really needs
to take cpu_add_remove_lock and if no, then acquire cpu hotplug lock instead.

The other approach is to keep the existing lock ordering as it is and yet provide
a race-free way to register, as I had posted some time ago (incomplete/untested):

http://thread.gmane.org/gmane.linux.kernel/1258880/focus=15826


> +	for_each_online_cpu(cpu) {
> +		err = numa_hotplug(NULL, CPU_UP_PREPARE, (void *)(long)cpu);
> +		BUG_ON(notifier_to_errno(err));
> +		numa_hotplug(NULL, CPU_ONLINE, (void *)(long)cpu);
>  	}
> +	put_online_cpus();
> 
>  	return 0;
>  }
> 
> 

 
Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/