linux-kernel - Re: [V2 PATCH 0/6] x86, NMI: give NMI handler a face-lift

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4CE543BB.2010903@windriver.com>
Date:	Thu, 18 Nov 2010 09:18:19 -0600
From:	Jason Wessel <jason.wessel@...driver.com>
To:	Don Zickus <dzickus@...hat.com>
CC:	Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...e.hu>,
	Robert Richter <robert.richter@....com>, ying.huang@...el.com,
	Andi Kleen <andi@...stfloor.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Frederic Weisbecker <fweisbec@...il.com>, gorcunov@...il.com
Subject: Re: [V2 PATCH 0/6] x86, NMI: give NMI handler a face-lift

On 11/18/2010 08:32 AM, Don Zickus wrote:
> On Thu, Nov 18, 2010 at 02:17:12PM +0100, Peter Zijlstra wrote:
>> On Thu, 2010-11-18 at 06:47 -0600, Jason Wessel wrote:
>>> More specifically
>>> when another subsystem injects an NMI event the perf NMI code returns
>>> NOTIFY_STOP. 
>> Not unconditionally, right? We only do so when the previous NMI was from
>> the PMU and nobody claimed this one (NOTIFY_STOP from DIE_NMIUNKNOWN).
>>
>> Or are you hitting the other one, where !handled but pmu_nmi.handled >
>> 1 ?
> 
> On my Nehalem box, the kgdb tests work fine, no issues there.  On my P4
> box, the p4 handler really thinks the NMIs are from the perf counter and
> returns handled==1 and starves the kgdb tests.
> 
> I haven't gotten around to checking Jason's kvm setup to determine which
> handler his setup is calling.
> 
> Jason, could you snip part of your dmesg log that shows the output with
> "Performance Events:" (or just send me the whole thing :-) ).
> 

I can see the hang in 3 of 4 qemu / kvm configurations running with
"-smp 2".

  qemu == Performance Events: p6 PMU driver.

  qemu-system-x86_64 == Performance Events: AMD PMU driver.

  kvm on RHEL 5 == Performance Events: p6 PMU driver.

kgdb tests pass with 64 bit kvm on unbutu 10.10 test system and it prints:

  Performance Events: unsupported p6 CPU model 2 no PMU driver, software
  events only.

I suspect it works because of the "unsupported".  The real p4 system I
have also duplicates the hang just like what you are seeing.

I don't believe this is the right way to fix the problem, but it does
work around the problem using the following patch (found below).  I
made that patch simply so I could execute some more testing and fix
some of the other regressions I did not know about.  The patch is
merely a crude way to say this NMI here really doesn't belong to the
perf call back.  The problem with this is that it would be exteremely
racy if you are starting and stopping the debugger.  There would be
the possibility for lost events etc...

Jason.

--

---
 arch/x86/kernel/cpu/perf_event.c |    3 ++-
 include/linux/kgdb.h             |    3 +++
 2 files changed, 5 insertions(+), 1 deletion(-)

--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -25,6 +25,7 @@
 #include <linux/highmem.h>
 #include <linux/cpu.h>
 #include <linux/bitops.h>
+#include <linux/kgdb.h>
 
 #include <asm/apic.h>
 #include <asm/stacktrace.h>
@@ -1221,7 +1222,7 @@ perf_event_nmi_handler(struct notifier_b
 	unsigned int this_nmi;
 	int handled;
 
-	if (!atomic_read(&active_events))
+	if (!atomic_read(&active_events) || in_debug_core())
 		return NOTIFY_DONE;
 
 	switch (cmd) {
--- a/include/linux/kgdb.h
+++ b/include/linux/kgdb.h
@@ -307,12 +307,15 @@ extern int kgdb_nmicallback(int cpu, voi
 
 extern int			kgdb_single_step;
 extern atomic_t			kgdb_active;
+#define in_debug_core() \
+	(atomic_read(&kgdb_active) != -1)
 #define in_dbg_master() \
 	(raw_smp_processor_id() == atomic_read(&kgdb_active))
 extern bool dbg_is_early;
 extern void __init dbg_late_init(void);
 #else /* ! CONFIG_KGDB */
 #define in_dbg_master() (0)
+#define in_debug_core() (0)
 #define dbg_late_init()
 #endif /* ! CONFIG_KGDB */
 #endif /* _KGDB_H_ */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/