lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Date:	Wed, 5 Feb 2014 17:18:16 -0800 (PST)
From:	Boris Ostrovsky <boris.ostrovsky@...cle.com>
To:	<srivatsa.bhat@...ux.vnet.ibm.com>
Cc:	<konrad.wilk@...cle.com>, <rusty@...tcorp.com.au>,
	<ego@...ux.vnet.ibm.com>, <tglx@...utronix.de>, <mingo@...nel.org>,
	<xen-devel@...ts.xenproject.org>, <paulus@...ba.org>,
	<akpm@...ux-foundation.org>, <linux@....linux.org.uk>,
	<oleg@...hat.com>, <paulmck@...ux.vnet.ibm.com>,
	<david.vrabel@...rix.com>, <tj@...nel.org>, <walken@...gle.com>,
	<linux-kernel@...r.kernel.org>, <peterz@...radead.org>
Subject: Re: [PATCH 44/51] xen, balloon: Fix CPU hotplug callback registration


----- srivatsa.bhat@...ux.vnet.ibm.com wrote:

> Subsystems that want to register CPU hotplug callbacks, as well as
> perform
> initialization for the CPUs that are already online, often do it as
> shown
> below:
> 
> 	get_online_cpus();
> 
> 	for_each_online_cpu(cpu)
> 		init_cpu(cpu);
> 
> 	register_cpu_notifier(&foobar_cpu_notifier);
> 
> 	put_online_cpus();
> 
> This is wrong, since it is prone to ABBA deadlocks involving the
> cpu_add_remove_lock and the cpu_hotplug.lock (when running
> concurrently
> with CPU hotplug operations).
> 
> Interestingly, the balloon code in xen can actually prevent double
> initialization and hence can use the following simplified form of
> callback
> registration:
> 
> 	register_cpu_notifier(&foobar_cpu_notifier);
> 
> 	get_online_cpus();
> 
> 	for_each_online_cpu(cpu)
> 		init_cpu(cpu);
> 
> 	put_online_cpus();
> 
> A hotplug operation that occurs between registering the notifier and
> calling
> get_online_cpus(), won't disrupt anything, because the code takes care
> to
> perform the memory allocations only once.
> 
> So reorganize the balloon code in xen this way to fix the deadlock
> with
> callback registration.
> 
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@...cle.com>
> Cc: David Vrabel <david.vrabel@...rix.com>
> Cc: xen-devel@...ts.xenproject.org
> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@...ux.vnet.ibm.com>
> ---
> 
>  drivers/xen/balloon.c |   35 +++++++++++++++++++++++------------
>  1 file changed, 23 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
> index 37d06ea..afe1a3f 100644
> --- a/drivers/xen/balloon.c
> +++ b/drivers/xen/balloon.c
> @@ -592,19 +592,29 @@ static void __init balloon_add_region(unsigned
> long start_pfn,
>  	}
>  }
>  
> +static int alloc_balloon_scratch_page(int cpu)
> +{
> +	if (per_cpu(balloon_scratch_page, cpu) != NULL)
> +		return 0;
> +
> +	per_cpu(balloon_scratch_page, cpu) = alloc_page(GFP_KERNEL);
> +	if (per_cpu(balloon_scratch_page, cpu) == NULL) {
> +		pr_warn("Failed to allocate balloon_scratch_page for cpu %d\n",
> cpu);
> +		return -ENOMEM;
> +	}
> +
> +	return 0;
> +}
> +
> +
>  static int balloon_cpu_notify(struct notifier_block *self,
>  				    unsigned long action, void *hcpu)
>  {
>  	int cpu = (long)hcpu;
>  	switch (action) {
>  	case CPU_UP_PREPARE:
> -		if (per_cpu(balloon_scratch_page, cpu) != NULL)
> -			break;
> -		per_cpu(balloon_scratch_page, cpu) = alloc_page(GFP_KERNEL);
> -		if (per_cpu(balloon_scratch_page, cpu) == NULL) {
> -			pr_warn("Failed to allocate balloon_scratch_page for cpu %d\n",
> cpu);
> +		if (alloc_balloon_scratch_page(cpu))
>  			return NOTIFY_BAD;
> -		}
>  		break;
>  	default:
>  		break;
> @@ -624,15 +634,16 @@ static int __init balloon_init(void)
>  		return -ENODEV;
>  
>  	if (!xen_feature(XENFEAT_auto_translated_physmap)) {
> -		for_each_online_cpu(cpu)
> -		{
> -			per_cpu(balloon_scratch_page, cpu) = alloc_page(GFP_KERNEL);
> -			if (per_cpu(balloon_scratch_page, cpu) == NULL) {
> -				pr_warn("Failed to allocate balloon_scratch_page for cpu %d\n",
> cpu);
> +		register_cpu_notifier(&balloon_cpu_notifier);
> +
> +		get_online_cpus();
> +		for_each_online_cpu(cpu) {
> +			if (alloc_balloon_scratch_page(cpu)) {
> +				put_online_cpus();
>  				return -ENOMEM;


Not that original code was doing a particularly thorough job of cleaning up on allocation failure but if it couldn't get memory it would not register the notifier. So perhaps you should unregister it before returning here.

I am also not sure how we were susceptible to the deadlock here since we didn't call get_online_cpus(). (We probably should have but then commit description should say it).

-boris

>  			}
>  		}
> -		register_cpu_notifier(&balloon_cpu_notifier);
> +		put_online_cpus();
>  	}
>  
>  	pr_info("Initialising balloon driver\n");
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists