lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080905102416.GA11212@elte.hu>
Date:	Fri, 5 Sep 2008 12:24:16 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Don Zickus <dzickus@...hat.com>
Cc:	Vivek Goyal <vgoyal@...hat.com>,
	"Mingarelli, Thomas" <Thomas.Mingarelli@...com>,
	Andi Kleen <andi@...stfloor.org>,
	Prarit Bhargava <prarit@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"arozansk@...hat.com" <arozansk@...hat.com>,
	"ak@...ux.intel.com" <ak@...ux.intel.com>,
	Alan Cox <alan@...rguk.ukuu.org.uk>,
	"H. Peter Anvin" <hpa@...or.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	"Maciej W. Rozycki" <macro@...ux-mips.org>
Subject: Re: [PATCH RFC] NMI Re-introduce un[set]_nmi_callback


* Don Zickus <dzickus@...hat.com> wrote:

> Prarit's patch disabled the timer upon registering a callback to 
> prevent this case.  The thought was if you have your own handler you 
> could provide your own watchdog.

btw., while we are talking about watchdog design problems, the current 
way the NMI timer watchdog oprofile fallback uses the die handler and 
how it all interacts with the ioapic/lapic-timer NMI watchdog is 
misdesigned as well. (this is used in the relatively rare but still 
possible case where no primary oprofile handler is found and installed)

The i386 and x86_64 architecture code started using die notifiers for 
the NMI timer watchdog oprofile code two years ago, via commit 2fbe7b25 
"i386/x86-64: Remove un/set_nmi_callback and reserve/release_lapic_nmi 
functions". [ Hey - what a coincidence - that was done by you! ;-) ]

It was a nice cleanup but not complete: using a die notifier for the NMI 
watchdog does not work properly as the NMI watchdog driver has no 
knowledge currently about whether it got activated (often it's not even 
possible to tell it):

 +       case DIE_NMI:
 +               oprofile_add_sample(args->regs, 0);
 +               ret = NOTIFY_STOP;
 +               break;

it always returns NOTIFY_STOP. That breaks all other lower prio 
notifiers down the chain which might have proper knowledge about the 
source of the NMI. Such kind of cross-notifier breakage is one of the 
reasons why driver notifiers try to become exclusive and try to play 
tricks with the nmi watchdog.

Instead the full solution would be for it to return NOTIFY_OK when it is 
not the source of the NMI (and is able to tell it).

[
  The reason why i qualified my statement with "when it is able to tell 
  it" is that while it is possible to disambiguate the NMI source when 
  the source of the NMI was the local apic timer (we already do that in 
  lapic_wd_event() - we return 1 if we rearmed the counter in the CPU), 
  it is not possible to tell it reliably that the NMI source was the
  IO-APIC.

  The reason for that is that the IO-APIC generated NMIs are 
  fundamentally 'anonymous' in that case: we put the legacy PIC into 
  auto-EOI mode and the IO-APIC broadcasts the NMIs and they are edge 
  triggered (on all but some really ancient systems) . So there's no 
  permanent and reliable information left about why the NMI was raised 
  in the CPU. This is a property of the hardware, we cannot solve it on 
  the kernel side. (we tried explicit ACKs before, and those are even 
  worse.)
]

So the right long-term solution is to move all NMI watchdog code 
(including the lockup detection bits - nmi_watchdog_tick(), etc.) into 
the notifier, then to propagate the re-arm information from 
lapic_wd_event() back into the notifier return code, and perhaps to 
print a (one-time) warning into the kernel log about the (in essence) 
deactivated lower-prio notifiers when the IO-APIC watchdog is activated.

This is all non-default debug or instrumentation functionality so if 
they cancel out each other in certain rare cases that's not a big issue 
in practice, as long as the user is informed about the constraints.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ