lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 03 Jun 2009 04:55:26 -0700
From:	ebiederm@...ssion.com (Eric W. Biederman)
To:	Gary Hade <garyhade@...ibm.com>
Cc:	mingo@...e.hu, mingo@...hat.com, linux-kernel@...r.kernel.org,
	tglx@...utronix.de, hpa@...or.com, x86@...nel.org,
	yinghai@...nel.org, lcm@...ibm.com
Subject: Re: [RESEND] [PATCH v2] [BUGFIX] x86/x86_64: fix CPU offlining triggered "inactive" device IRQ interrruption

Gary Hade <garyhade@...ibm.com> writes:

> Impact: Eliminates a race that can leave the system in an
>         unusable state
>
> During rapid offlining of multiple CPUs there is a chance
> that an IRQ affinity move destination CPU will be offlined
> before the IRQ affinity move initiated during the offlining
> of a previous CPU completes.  This can happen when the device
> is not very active and thus fails to generate the IRQ that is
> needed to complete the IRQ affinity move before the move
> destination CPU is offlined.  When this happens there is an
> -EBUSY return from __assign_irq_vector() during the offlining
> of the IRQ move destination CPU which prevents initiation of
> a new IRQ affinity move operation to an online CPU.  This
> leaves the IRQ affinity set to an offlined CPU.
>
> I have been able to reproduce the problem on some of our
> systems using the following script.  When the system is idle
> the problem often reproduces during the first CPU offlining
> sequence.

Nacked-by: "Eric W. Biederman" <ebiederm@...ssion.com>

fixup_irqs() is broken for allowing such a thing.

> #!/bin/sh
>
> SYS_CPU_DIR=/sys/devices/system/cpu
> VICTIM_IRQ=25
> IRQ_MASK=f0
>
> iteration=0
> while true; do
>   echo $iteration
>   echo $IRQ_MASK > /proc/irq/$VICTIM_IRQ/smp_affinity
>   for cpudir in $SYS_CPU_DIR/cpu[1-9] $SYS_CPU_DIR/cpu??; do
>     echo 0 > $cpudir/online
>   done
>   for cpudir in $SYS_CPU_DIR/cpu[1-9] $SYS_CPU_DIR/cpu??; do
>     echo 1 > $cpudir/online
>   done
>   iteration=`expr $iteration + 1`
> done
>
> The proposed fix takes advantage of the fact that when all
> CPUs in the old domain are offline there is nothing to be done
> by send_cleanup_vector() during the affinity move completion.
> So, we simply avoid setting cfg->move_in_progress preventing
> the above mentioned -EBUSY return from __assign_irq_vector().
> This allows initiation of a new IRQ affinity move to a CPU
> that is not going offline.
>
> Successfully tested with Ingo's linux-2.6-tip (32 and 64-bit
> builds) on the IBM x460, x3550 M2, x3850, and x3950 M2.
>
> v2: modified to integrate with Yinghai Lu's
>     "x86/irq: remove leftover code from NUMA_MIGRATE_IRQ_DESC"
>     patch which modified intersecting lines.  Only comment 
>     changes were affected.  The actual change to the code 
>     is the same.  
>
> Signed-off-by: Gary Hade <garyhade@...ibm.com>
>
> ---
>  arch/x86/kernel/apic/io_apic.c |    5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> Index: linux-2.6-tip/arch/x86/kernel/apic/io_apic.c
> ===================================================================
> --- linux-2.6-tip.orig/arch/x86/kernel/apic/io_apic.c	2009-05-14 14:06:30.000000000 -0700
> +++ linux-2.6-tip/arch/x86/kernel/apic/io_apic.c	2009-05-14 14:09:42.000000000 -0700
> @@ -1218,8 +1218,11 @@ next:
>  		current_vector = vector;
>  		current_offset = offset;
>  		if (old_vector) {
> -			cfg->move_in_progress = 1;
>  			cpumask_copy(cfg->old_domain, cfg->domain);
> +			if (cpumask_intersects(cfg->old_domain,
> +					       cpu_online_mask)) {
> +				cfg->move_in_progress = 1;
> +			}
>  		}
>  		for_each_cpu_and(new_cpu, tmp_mask, cpu_online_mask)
>  			per_cpu(vector_irq, new_cpu)[vector] = irq;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ