[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181120170842.GZ2131@hirez.programming.kicks-ass.net>
Date: Tue, 20 Nov 2018 18:08:42 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Kyle Huey <me@...ehuey.com>
Cc: Andi Kleen <ak@...ux.intel.com>,
Kan Liang <kan.liang@...ux.intel.com>,
Ingo Molnar <mingo@...nel.org>,
Robert O'Callahan <robert@...llahan.org>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Arnaldo Carvalho de Melo <acme@...hat.com>,
Jiri Olsa <jolsa@...hat.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Stephane Eranian <eranian@...gle.com>,
Thomas Gleixner <tglx@...utronix.de>,
Vince Weaver <vincent.weaver@...ne.edu>, acme@...nel.org,
open list <linux-kernel@...r.kernel.org>
Subject: Re: [REGRESSION] x86, perf: counter freezing breaks rr
On Tue, Nov 20, 2018 at 08:19:39AM -0800, Kyle Huey wrote:
> tl;dr: rr is currently broken on 4.20rc2, which I bisected to
> af3bdb991a5cb57c189d34aadbd3aa88995e0d9f. I further confirmed that
> booting the 4.20rc2 kernel with `disable_counter_freezing=true` allows
> rr to work.
>
> rr, a userspace record and replay debugger[0], uses the PMU interrupt
> (PMI) to stop a program during replay to inject asynchronous events
> such as signals. With perf counter freezing enabled we are reliably
> seeing perf event overcounts during replay. This behavior is easily
> demonstrated by attempting to record and replay the `alarm` test from
> rr's test suite. Through bisection I determined that [1] is the first
> bad commit, and further testing showed that booting the kernel with
> `disable_counter_freezing=true` fixes rr.
>
> This behavior has been observed on two different CPUs (a Core i7-6700K
> and a Xeon E3-1505M v5). We have no reason to believe it is limited to
> specific CPU models, this information is included only for
> completeness.
>
> Given that we're already at rc3, and that this renders rr unusable,
> we'd ask that counter freezing be disabled for the 4.20 release.
Andi, can you have a look at this?
Meanwhile, I suppose we should do something along these lines.
---
Subject: perf/x86/intel: Default disable perfmon v4 interrupt handling
Rework the 'disable_counter_freezing' __setup() parameter such that we
can explicitly enable/disable it and switch to default disabled.
To this purpose, rename the parameter to "perf_v4_pmi=" which is a much
better description and allows requiring a bool argument.
Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
---
Documentation/admin-guide/kernel-parameters.txt | 3 ++-
arch/x86/events/intel/core.c | 12 ++++++++----
2 files changed, 10 insertions(+), 5 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 76c82c01bf5e..ff6d1d4229e0 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -856,7 +856,8 @@
causing system reset or hang due to sending
INIT from AP to BSP.
- disable_counter_freezing [HW]
+ perf_v4_pmi= [X86,INTEL]
+ Format: <bool>
Disable Intel PMU counter freezing feature.
The feature only exists starting from
Arch Perfmon v4 (Skylake and newer).
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 273c62e81546..af8bea9d4006 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2306,14 +2306,18 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
return handled;
}
-static bool disable_counter_freezing;
+static bool disable_counter_freezing = true;
static int __init intel_perf_counter_freezing_setup(char *s)
{
- disable_counter_freezing = true;
- pr_info("Intel PMU Counter freezing feature disabled\n");
+ bool res;
+
+ if (kstrtobool(s, &res))
+ return -EINVAL;
+
+ disable_counter_freezing = !res;
return 1;
}
-__setup("disable_counter_freezing", intel_perf_counter_freezing_setup);
+__setup("perf_v4_pmi=", intel_perf_counter_freezing_setup);
/*
* Simplified handler for Arch Perfmon v4:
Powered by blists - more mailing lists