[<prev] [next>] [day] [month] [year] [list]
Message-ID: <7c7623ad-516f-4615-8923-c64ea203636c@default>
Date: Wed, 5 Feb 2014 17:18:16 -0800 (PST)
From: Boris Ostrovsky <boris.ostrovsky@...cle.com>
To: <srivatsa.bhat@...ux.vnet.ibm.com>
Cc: <konrad.wilk@...cle.com>, <rusty@...tcorp.com.au>,
<ego@...ux.vnet.ibm.com>, <tglx@...utronix.de>, <mingo@...nel.org>,
<xen-devel@...ts.xenproject.org>, <paulus@...ba.org>,
<akpm@...ux-foundation.org>, <linux@....linux.org.uk>,
<oleg@...hat.com>, <paulmck@...ux.vnet.ibm.com>,
<david.vrabel@...rix.com>, <tj@...nel.org>, <walken@...gle.com>,
<linux-kernel@...r.kernel.org>, <peterz@...radead.org>
Subject: Re: [PATCH 44/51] xen, balloon: Fix CPU hotplug callback registration
----- srivatsa.bhat@...ux.vnet.ibm.com wrote:
> Subsystems that want to register CPU hotplug callbacks, as well as
> perform
> initialization for the CPUs that are already online, often do it as
> shown
> below:
>
> get_online_cpus();
>
> for_each_online_cpu(cpu)
> init_cpu(cpu);
>
> register_cpu_notifier(&foobar_cpu_notifier);
>
> put_online_cpus();
>
> This is wrong, since it is prone to ABBA deadlocks involving the
> cpu_add_remove_lock and the cpu_hotplug.lock (when running
> concurrently
> with CPU hotplug operations).
>
> Interestingly, the balloon code in xen can actually prevent double
> initialization and hence can use the following simplified form of
> callback
> registration:
>
> register_cpu_notifier(&foobar_cpu_notifier);
>
> get_online_cpus();
>
> for_each_online_cpu(cpu)
> init_cpu(cpu);
>
> put_online_cpus();
>
> A hotplug operation that occurs between registering the notifier and
> calling
> get_online_cpus(), won't disrupt anything, because the code takes care
> to
> perform the memory allocations only once.
>
> So reorganize the balloon code in xen this way to fix the deadlock
> with
> callback registration.
>
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@...cle.com>
> Cc: David Vrabel <david.vrabel@...rix.com>
> Cc: xen-devel@...ts.xenproject.org
> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@...ux.vnet.ibm.com>
> ---
>
> drivers/xen/balloon.c | 35 +++++++++++++++++++++++------------
> 1 file changed, 23 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
> index 37d06ea..afe1a3f 100644
> --- a/drivers/xen/balloon.c
> +++ b/drivers/xen/balloon.c
> @@ -592,19 +592,29 @@ static void __init balloon_add_region(unsigned
> long start_pfn,
> }
> }
>
> +static int alloc_balloon_scratch_page(int cpu)
> +{
> + if (per_cpu(balloon_scratch_page, cpu) != NULL)
> + return 0;
> +
> + per_cpu(balloon_scratch_page, cpu) = alloc_page(GFP_KERNEL);
> + if (per_cpu(balloon_scratch_page, cpu) == NULL) {
> + pr_warn("Failed to allocate balloon_scratch_page for cpu %d\n",
> cpu);
> + return -ENOMEM;
> + }
> +
> + return 0;
> +}
> +
> +
> static int balloon_cpu_notify(struct notifier_block *self,
> unsigned long action, void *hcpu)
> {
> int cpu = (long)hcpu;
> switch (action) {
> case CPU_UP_PREPARE:
> - if (per_cpu(balloon_scratch_page, cpu) != NULL)
> - break;
> - per_cpu(balloon_scratch_page, cpu) = alloc_page(GFP_KERNEL);
> - if (per_cpu(balloon_scratch_page, cpu) == NULL) {
> - pr_warn("Failed to allocate balloon_scratch_page for cpu %d\n",
> cpu);
> + if (alloc_balloon_scratch_page(cpu))
> return NOTIFY_BAD;
> - }
> break;
> default:
> break;
> @@ -624,15 +634,16 @@ static int __init balloon_init(void)
> return -ENODEV;
>
> if (!xen_feature(XENFEAT_auto_translated_physmap)) {
> - for_each_online_cpu(cpu)
> - {
> - per_cpu(balloon_scratch_page, cpu) = alloc_page(GFP_KERNEL);
> - if (per_cpu(balloon_scratch_page, cpu) == NULL) {
> - pr_warn("Failed to allocate balloon_scratch_page for cpu %d\n",
> cpu);
> + register_cpu_notifier(&balloon_cpu_notifier);
> +
> + get_online_cpus();
> + for_each_online_cpu(cpu) {
> + if (alloc_balloon_scratch_page(cpu)) {
> + put_online_cpus();
> return -ENOMEM;
Not that original code was doing a particularly thorough job of cleaning up on allocation failure but if it couldn't get memory it would not register the notifier. So perhaps you should unregister it before returning here.
I am also not sure how we were susceptible to the deadlock here since we didn't call get_online_cpus(). (We probably should have but then commit description should say it).
-boris
> }
> }
> - register_cpu_notifier(&balloon_cpu_notifier);
> + put_online_cpus();
> }
>
> pr_info("Initialising balloon driver\n");
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists