lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20070621162019.650c8aac@windmill.dev.rtsoft.ru>
Date:	Thu, 21 Jun 2007 16:20:19 +0400
From:	Konstantin Baydarov <kbaidarov@...mvista.com>
To:	Steven Rostedt <rostedt@...dmis.org>
Cc:	Ingo Molnar <mingo@...e.hu>, Thomas Gleixner <tglx@...utronix.de>,
	LKML <linux-kernel@...r.kernel.org>,
	RT <linux-rt-users@...r.kernel.org>
Subject: Re: [PATCH RT] have x86_64 nmi watchdog also count irq 0

On Wed, 20 Jun 2007 14:51:03 -0400
Steven Rostedt <rostedt@...dmis.org> wrote:

> I was getting false reports about NMI lockups on CPU 3.  For some
> reason CPU 0,1 and 2 where using apic timer, and CPU 3 was using the
> irq 0 timer.
> 
I think you are telling about case when nmi_watchdog is set to IO_APIC
(cmdline:nmi_watchdog=1). I've faced that problem too.
In case when nmi_watchdog is set to IO_APIC local APIC timers are disabled
kernel uses broadcast timer. External timer(PIT or PM I'm not sure) is used
instead of local CPU apic timers. But even that lapic timers are disabled,
lapic timer handler smp_apic_timer_interrupt() is triggered by broadcast timer
IPIs with the same vector as local APIC IRQs.
The problem is that IPIs are sent to all CPUs except CPU that is sending IPIs,
so local_apic_timer_interrupt won't executed for CPU that initiated IPIs,
instead high level timer code(event_handler) will be called explicitly. It's not a
problem in general but in case of watchdog per CPU variable apic_timer_irqs won't
be incremented, that's why we get lockup.
Steven, I think you have Boot CPU with physical id 3, timer handler(irq 0) runs
and sends broadcast IPIs only on Boot CPU so every time you get lockup on CPU3.

Also I suggest to fix i386 kernel, we should use per CPU statistic:
Index: git-linux-2.6/arch/i386/kernel/nmi.c
===================================================================
--- git-linux-2.6.orig/arch/i386/kernel/nmi.c
+++ git-linux-2.6/arch/i386/kernel/nmi.c
@@ -351,7 +351,7 @@ __kprobes int nmi_watchdog_tick(struct p
         * Take the local apic timer and PIT/HPET into account. We don't
         * know which one is active, when we have highres/dyntick on
         */
-       sum = per_cpu(irq_stat, cpu).apic_timer_irqs + kstat_irqs(0);
+       sum = per_cpu(irq_stat, cpu).apic_timer_irqs + kstat_cpu(cpu).irqs[0];

        /* if the none of the timers isn't firing, this cpu isn't doing much */
        if (!touched && last_irq_sums[cpu] == sum) {
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ