lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Fri, 5 Oct 2007 22:37:07 +0200 (CEST)
From:	Thomas Gleixner <tglx@...utronix.de>
To:	"Pallipadi, Venkatesh" <venkatesh.pallipadi@...el.com>
cc:	Andi Kleen <ak@...e.de>, Arjan van de Ven <arjan@...radead.org>,
	David Bahi <dbahi@...ell.com>,
	LKML <linux-kernel@...r.kernel.org>,
	linux-rt-users@...r.kernel.org,
	Andrew Morton <akpm@...ux-foundation.org>,
	Ingo Molnar <mingo@...e.hu>,
	Gregory Haskins <GHaskins@...ell.com>
Subject: RE: nmi_watchdog fix for x86_64 to be more like i386

On Thu, 4 Oct 2007, Pallipadi, Venkatesh wrote:
> >-----Original Message-----
> >From: linux-kernel-owner@...r.kernel.org 
> >[mailto:linux-kernel-owner@...r.kernel.org] On Behalf Of 
> >Thomas Gleixner
> >Sent: Monday, October 01, 2007 11:19 PM
> >To: Andi Kleen
> >Cc: Arjan van de Ven; David Bahi; LKML; 
> >linux-rt-users@...r.kernel.org; Andrew Morton; Ingo Molnar; 
> >Gregory Haskins
> >Subject: Re: nmi_watchdog fix for x86_64 to be more like i386
> >
> >>
> >> The only workaround for chipsets ignoring IRQ affinity would 
> >be to keep
> >> track on which CPU irq 0 happens and then restart APIC timer 
> >interrupts
> >> on the others (or send IPIs) as needed. But that would be 
> >fairly ugly.
> >
> >The clock events code does handle this already. The broadcast 
> >interrupt 
> >can come in on any cpu. It's just the nmi watchdog which would 
> >be affected 
> >by that.
> >
> 
> Probably we can workaround this by keeping track of IRQ0 count at percpu
> level and
> use local apic timer + this percpu counter in NMI. Or just increment
> local
> apic timer count in IRQ0 with nohz enabled.

No, I tried that. It's ugly.

The per cpu accounting is the correct way to go if we want to take
care of those systems, which ignore the CPU0 binding of irq0.

See patch against the x86 tree below.

	tglx

-------------------->
commit 093976c7ad206a008bd5de4619f40f6bca4a79c3
Author: Thomas Gleixner <tglx@...elltoy.tec.linutronix.de>
Date:   Fri Oct 5 22:19:18 2007 +0200

    x86: Fix irq0 / local apic timer accounting
    
    The clock events merge introduced a change to the nmi watchdog code to
    handle the not longer increasing local apic timer count in the
    broadcast mode. This is fine for UP, but on SMP it pampers over a
    stuck CPU which is not handling the broadcast interrupt due to the
    unconditional sum up of local apic timer count and irq0 count.
    
    To cover all cases we need to keep track on which CPU irq0 is
    handled. In theory this is CPU#0 due to the explicit disabling of irq
    balancing for irq0, but there are systems which ignore this on the
    hardware level. The per cpu irq0 accounting allows us to remove the
    irq0 to CPU0 binding as well.
    
    Add a per cpu counter for irq0 and evaluate this instead of the global
    irq0 count in the nmi watchdog code.
    
    Signed-off-by: Thomas Gleixner <tglx@...utronix.de>

diff --git a/arch/x86/kernel/nmi_32.c b/arch/x86/kernel/nmi_32.c
index c7227e2..95d3fc2 100644
--- a/arch/x86/kernel/nmi_32.c
+++ b/arch/x86/kernel/nmi_32.c
@@ -353,7 +353,8 @@ __kprobes int nmi_watchdog_tick(struct pt_regs * regs, unsigned reason)
 	 * Take the local apic timer and PIT/HPET into account. We don't
 	 * know which one is active, when we have highres/dyntick on
 	 */
-	sum = per_cpu(irq_stat, cpu).apic_timer_irqs + kstat_cpu(cpu).irqs[0];
+	sum = per_cpu(irq_stat, cpu).apic_timer_irqs +
+		per_cpu(irq_stat, cpu).irq0_irqs;
 
 	/* if the none of the timers isn't firing, this cpu isn't doing much */
 	if (!touched && last_irq_sums[cpu] == sum) {
diff --git a/arch/x86/kernel/time_32.c b/arch/x86/kernel/time_32.c
index 19a6c67..3571d0a 100644
--- a/arch/x86/kernel/time_32.c
+++ b/arch/x86/kernel/time_32.c
@@ -157,6 +157,9 @@ EXPORT_SYMBOL(profile_pc);
  */
 irqreturn_t timer_interrupt(int irq, void *dev_id)
 {
+	/* Keep nmi watchdog up to date */
+	per_cpu(irq_stat, cpu).irq0_irqs++;
+
 #ifdef CONFIG_X86_IO_APIC
 	if (timer_ack) {
 		/*
diff --git a/include/asm-x86/hardirq_32.h b/include/asm-x86/hardirq_32.h
index ed7cf97..9188635 100644
--- a/include/asm-x86/hardirq_32.h
+++ b/include/asm-x86/hardirq_32.h
@@ -9,6 +9,7 @@ typedef struct {
 	unsigned long idle_timestamp;
 	unsigned int __nmi_count;	/* arch dependent */
 	unsigned int apic_timer_irqs;	/* arch dependent */
+	unsigned int irq0_irqs;
 } ____cacheline_aligned irq_cpustat_t;
 
 DECLARE_PER_CPU(irq_cpustat_t, irq_stat);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ