lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100809194829.GB26154@erda.amd.com>
Date:	Mon, 9 Aug 2010 21:48:29 +0200
From:	Robert Richter <robert.richter@....com>
To:	Don Zickus <dzickus@...hat.com>
CC:	Cyrill Gorcunov <gorcunov@...il.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Lin Ming <ming.m.lin@...el.com>, Ingo Molnar <mingo@...e.hu>,
	"fweisbec@...il.com" <fweisbec@...il.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"Huang, Ying" <ying.huang@...el.com>,
	Yinghai Lu <yinghai@...nel.org>,
	Andi Kleen <andi@...stfloor.org>
Subject: [PATCH] perf, x86: try to handle unknown nmis with running perfctrs

On 06.08.10 10:21:31, Don Zickus wrote:
> On Fri, Aug 06, 2010 at 08:52:03AM +0200, Robert Richter wrote:

> > I was playing around with it yesterday trying to fix this. My idea is
> > to skip an unkown nmi if the privious nmi was a *handled* perfctr
> 
> You might want to add a little more logic that says *handled* _and_ had
> more than one perfctr trigger.  Most of the time only one perfctr is
> probably triggering, so you might be eating unknown_nmi's needlessly.
> 
> Just a thought.

Yes, that's true. It could be implemented on top of the patch below.

> 
> > nmi. I will probably post an rfc patch early next week.

Here it comes:

>From d2739578199d881ae6a9537c1b96a0efd1cdea43 Mon Sep 17 00:00:00 2001
From: Robert Richter <robert.richter@....com>
Date: Thu, 5 Aug 2010 16:19:59 +0200
Subject: [PATCH] perf, x86: try to handle unknown nmis with running perfctrs

When perfctrs are running it is valid to have unhandled nmis, two
events could trigger 'simultaneously' raising two back-to-back
NMIs. If the first NMI handles both, the latter will be empty and daze
the CPU.

The solution to avoid an 'unknown nmi' massage in this case was simply
to stop the nmi handler chain when perfctrs are runnning by stating
the nmi was handled. This has the drawback that a) we can not detect
unknown nmis anymore, and b) subsequent nmi handlers are not called.

This patch addresses this. Now, we drop this unknown NMI only if the
previous NMI was handling a perfctr. Otherwise we pass it and let the
kernel handle the unknown nmi. The check runs only if no nmi handler
could handle the nmi (DIE_NMIUNKNOWN case).

We could improve this further by checking if perf was handling more
than one counter. Otherwise we may pass the unknown nmi too.

Signed-off-by: Robert Richter <robert.richter@....com>
---
 arch/x86/kernel/cpu/perf_event.c |   39 +++++++++++++++++++++++++++++--------
 1 files changed, 30 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index f2da20f..c3cd159 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1200,12 +1200,16 @@ void perf_events_lapic_init(void)
 	apic_write(APIC_LVTPC, APIC_DM_NMI);
 }
 
+static DEFINE_PER_CPU(unsigned int, perfctr_handled);
+
 static int __kprobes
 perf_event_nmi_handler(struct notifier_block *self,
 			 unsigned long cmd, void *__args)
 {
 	struct die_args *args = __args;
 	struct pt_regs *regs;
+	unsigned int this_nmi;
+	unsigned int prev_nmi;
 
 	if (!atomic_read(&active_events))
 		return NOTIFY_DONE;
@@ -1214,7 +1218,26 @@ perf_event_nmi_handler(struct notifier_block *self,
 	case DIE_NMI:
 	case DIE_NMI_IPI:
 		break;
-
+	case DIE_NMIUNKNOWN:
+		/*
+		 * This one could be our NMI, two events could trigger
+		 * 'simultaneously' raising two back-to-back NMIs. If
+		 * the first NMI handles both, the latter will be
+		 * empty and daze the CPU.
+		 *
+		 * So, we drop this unknown NMI if the previous NMI
+		 * was handling a perfctr. Otherwise we pass it and
+		 * let the kernel handle the unknown nmi.
+		 *
+		 * Note: this could be improved if we drop unknown
+		 * NMIs only if we handled more than one perfctr in
+		 * the previous NMI.
+		 */
+		this_nmi = percpu_read(irq_stat.__nmi_count);
+		prev_nmi = __get_cpu_var(perfctr_handled);
+		if (this_nmi == prev_nmi + 1)
+			return NOTIFY_STOP;
+		return NOTIFY_DONE;
 	default:
 		return NOTIFY_DONE;
 	}
@@ -1222,14 +1245,12 @@ perf_event_nmi_handler(struct notifier_block *self,
 	regs = args->regs;
 
 	apic_write(APIC_LVTPC, APIC_DM_NMI);
-	/*
-	 * Can't rely on the handled return value to say it was our NMI, two
-	 * events could trigger 'simultaneously' raising two back-to-back NMIs.
-	 *
-	 * If the first NMI handles both, the latter will be empty and daze
-	 * the CPU.
-	 */
-	x86_pmu.handle_irq(regs);
+
+	if (!x86_pmu.handle_irq(regs))
+		return NOTIFY_DONE;
+
+	/* handled */
+	__get_cpu_var(perfctr_handled) = percpu_read(irq_stat.__nmi_count);
 
 	return NOTIFY_STOP;
 }
-- 
1.7.1.1

-- 
Advanced Micro Devices, Inc.
Operating System Research Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ