linux-kernel - Re: [patch 1/4] hotplug: Prevent alloc/free of irq descriptors during cpu up/down

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <55A51F10.7010407@oracle.com>
Date:	Tue, 14 Jul 2015 10:39:12 -0400
From:	Boris Ostrovsky <boris.ostrovsky@...cle.com>
To:	Thomas Gleixner <tglx@...utronix.de>,
	LKML <linux-kernel@...r.kernel.org>
CC:	Ingo Molnar <mingo@...nel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Peter Anvin <hpa@...or.com>, xiao jin <jin.xiao@...el.com>,
	Joerg Roedel <jroedel@...e.de>, Borislav Petkov <bp@...e.de>,
	Yanmin Zhang <yanmin_zhang@...ux.intel.com>,
	xen-devel <xen-devel@...ts.xenproject.org>
Subject: Re: [patch 1/4] hotplug: Prevent alloc/free of irq descriptors during
 cpu up/down

On 07/05/2015 01:12 PM, Thomas Gleixner wrote:
> When a cpu goes up some architectures (e.g. x86) have to walk the irq
> space to set up the vector space for the cpu. While this needs extra
> protection at the architecture level we can avoid a few race
> conditions by preventing the concurrent allocation/free of irq
> descriptors and the associated data.
>
> When a cpu goes down it moves the interrupts which are targeted to
> this cpu away by reassigning the affinities. While this happens
> interrupts can be allocated and freed, which opens a can of race
> conditions in the code which reassignes the affinities because
> interrupt descriptors might be freed underneath.
>
> Example:
>
> CPU1				CPU2
> cpu_up/down
>   irq_desc = irq_to_desc(irq);
> 				remove_from_radix_tree(desc);
>   raw_spin_lock(&desc->lock);
> 				free(desc);
>
> We could protect the irq descriptors with RCU, but that would require
> a full tree change of all accesses to interrupt descriptors. But
> fortunately these kind of race conditions are rather limited to a few
> things like cpu hotplug. The normal setup/teardown is very well
> serialized. So the simpler and obvious solution is:
>
> Prevent allocation and freeing of interrupt descriptors accross cpu
> hotplug.


This breaks Xen guests that allocate interrupt descriptors in .cpu_up().

Any chance this locking can be moved into arch code? Otherwise we will 
need to have something like arch_post_cpu_up() after the lock is released.

(The patch doesn't appear to have any side effects for the down path 
since Xen guests deallocate descriptors in __cpu_die()).


-boris


>
> Signed-off-by: Thomas Gleixner<tglx@...utronix.de>
> ---
>   include/linux/irqdesc.h |    7 ++++++-
>   kernel/cpu.c            |   21 ++++++++++++++++++++-
>   kernel/irq/internals.h  |    4 ----
>   3 files changed, 26 insertions(+), 6 deletions(-)
>
> Index: tip/include/linux/irqdesc.h
> ===================================================================
> --- tip.orig/include/linux/irqdesc.h
> +++ tip/include/linux/irqdesc.h
> @@ -90,7 +90,12 @@ struct irq_desc {
>   	const char		*name;
>   } ____cacheline_internodealigned_in_smp;
>
> -#ifndef CONFIG_SPARSE_IRQ
> +#ifdef CONFIG_SPARSE_IRQ
> +extern void irq_lock_sparse(void);
> +extern void irq_unlock_sparse(void);
> +#else
> +static inline void irq_lock_sparse(void) { }
> +static inline void irq_unlock_sparse(void) { }
>   extern struct irq_desc irq_desc[NR_IRQS];
>   #endif
>
> Index: tip/kernel/cpu.c
> ===================================================================
> --- tip.orig/kernel/cpu.c
> +++ tip/kernel/cpu.c
> @@ -392,13 +392,19 @@ static int __ref _cpu_down(unsigned int
>   	smpboot_park_threads(cpu);
>
>   	/*
> -	 * So now all preempt/rcu users must observe !cpu_active().
> +	 * Prevent irq alloc/free while the dying cpu reorganizes the
> +	 * interrupt affinities.
>   	 */
> +	irq_lock_sparse();
>
> +	/*
> +	 * So now all preempt/rcu users must observe !cpu_active().
> +	 */
>   	err = __stop_machine(take_cpu_down, &tcd_param, cpumask_of(cpu));
>   	if (err) {
>   		/* CPU didn't die: tell everyone.  Can't complain. */
>   		cpu_notify_nofail(CPU_DOWN_FAILED | mod, hcpu);
> +		irq_unlock_sparse();
>   		goto out_release;
>   	}
>   	BUG_ON(cpu_online(cpu));
> @@ -415,6 +421,9 @@ static int __ref _cpu_down(unsigned int
>   	smp_mb(); /* Read from cpu_dead_idle before __cpu_die(). */
>   	per_cpu(cpu_dead_idle, cpu) = false;
>
> +	/* Interrupts are moved away from the dying cpu, reenable alloc/free */
> +	irq_unlock_sparse();
> +
>   	hotplug_cpu__broadcast_tick_pull(cpu);
>   	/* This actually kills the CPU. */
>   	__cpu_die(cpu);
> @@ -517,8 +526,18 @@ static int _cpu_up(unsigned int cpu, int
>   		goto out_notify;
>   	}
>
> +	/*
> +	 * Some architectures have to walk the irq descriptors to
> +	 * setup the vector space for the cpu which comes online.
> +	 * Prevent irq alloc/free across the bringup.
> +	 */
> +	irq_lock_sparse();
> +
>   	/* Arch-specific enabling code. */
>   	ret = __cpu_up(cpu, idle);
> +
> +	irq_unlock_sparse();
> +
>   	if (ret != 0)
>   		goto out_notify;
>   	BUG_ON(!cpu_online(cpu));
> Index: tip/kernel/irq/internals.h
> ===================================================================
> --- tip.orig/kernel/irq/internals.h
> +++ tip/kernel/irq/internals.h
> @@ -76,12 +76,8 @@ extern void unmask_threaded_irq(struct i
>
>   #ifdef CONFIG_SPARSE_IRQ
>   static inline void irq_mark_irq(unsigned int irq) { }
> -extern void irq_lock_sparse(void);
> -extern void irq_unlock_sparse(void);
>   #else
>   extern void irq_mark_irq(unsigned int irq);
> -static inline void irq_lock_sparse(void) { }
> -static inline void irq_unlock_sparse(void) { }
>   #endif
>
>   extern void init_kstat_irqs(struct irq_desc *desc, int node, int nr);
>
>
> --

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/