[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6316c9aa-005e-e01a-8a54-b3a9c241da7c@linux.intel.com>
Date: Wed, 12 Sep 2018 09:33:36 -0400
From: "Liang, Kan" <kan.liang@...ux.intel.com>
To: peterz@...radead.org, tglx@...utronix.de, mingo@...hat.com,
acme@...nel.org, linux-kernel@...r.kernel.org
Cc: eranian@...gle.com, ak@...ux.intel.com,
alexander.shishkin@...ux.intel.com
Subject: Re: [PATCH V2 2/3] x86, perf: Add a separate Arch Perfmon v4 PMI
handler
Hi Peter,
Any comments for the patch series regarding to v4 PMI handler?
Thanks,
Kan
On 8/8/2018 3:12 AM, kan.liang@...ux.intel.com wrote:
> From: Andi Kleen <ak@...ux.intel.com>
>
> Implements counter freezing for Arch Perfmon v4 (Skylake and
> newer). This allows to speed up the PMI handler by avoiding
> unnecessary MSR writes and make it more accurate.
>
> The Arch Perfmon v4 PMI handler is substantially different than
> the older PMI handler.
>
> Differences to the old handler:
> - It relies on counter freezing, which eliminates several MSR
> writes from the PMI handler and lowers the overhead significantly.
>
> It makes the PMI handler more accurate, as all counters get
> frozen atomically as soon as any counter overflows. So there is
> much less counting of the PMI handler itself.
>
> With the freezing we don't need to disable or enable counters or
> PEBS. Only BTS which does not support auto-freezing still needs to
> be explicitly managed.
>
> - The PMU acking is done at the end, not the beginning.
> This makes it possible to avoid manual enabling/disabling
> of the PMU, instead we just rely on the freezing/acking.
>
> - The APIC is acked before reenabling the PMU, which avoids
> problems with LBRs occasionally not getting unfreezed on Skylake.
>
> - Looping is only needed to workaround a corner case which several PMIs
> are very close to each other. For common cases, the counters are freezed
> during PMI handler. It doesn't need to do re-check.
>
> This patch
> - Adds code to enable v4 counter freezing
> - Fork <=v3 and >=v4 PMI handlers into separate functions.
> - Add kernel parameter to disable counter freezing. It took some time to
> debug counter freezing, so in case there are new problems we added an
> option to turn it off. Would not expect this to be used until there
> are new bugs.
> - Only for big core. The patch for small core will be posted later
> separately.
>
> Performance:
>
> When profiling a kernel build on Kabylake with different perf options,
> measuring the length of all NMI handlers using the nmi handler
> trace point:
>
> V3 is without counter freezing.
> V4 is with counter freezing.
> The value is the average cost of the PMI handler.
> (lower is better)
>
> perf options ` V3(ns) V4(ns) delta
> -c 100000 1088 894 -18%
> -g -c 100000 1862 1646 -12%
> --call-graph lbr -c 100000 3649 3367 -8%
> --c.g. dwarf -c 100000 2248 1982 -12%
>
> Signed-off-by: Andi Kleen <ak@...ux.intel.com>
> Signed-off-by: Kan Liang <kan.liang@...ux.intel.com>
> ---
>
> Changes since V1:
> - Move enable_counter_freeze() to intel_pmu_cpu_starting().
> - Remove frozen_enabled. The state of counter-freeze feature doesn't
> change after initialization.
> - Use __setup() to replace of module_param
> - Don't print "counter freezing" to log
> - Use bit fields to replace bool for all PMI handler knobs.
> - Update comments and document
>
> Documentation/admin-guide/kernel-parameters.txt | 5 ++
> arch/x86/events/intel/core.c | 112 ++++++++++++++++++++++++
> arch/x86/events/perf_event.h | 4 +-
> arch/x86/include/asm/msr-index.h | 1 +
> 4 files changed, 121 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 533ff5c..cb2a6f68 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -828,6 +828,11 @@
> causing system reset or hang due to sending
> INIT from AP to BSP.
>
> + disable_counter_freezing [HW]
> + Disable Intel PMU counter freezing feature.
> + The feature only exists starting from
> + Arch Perfmon v4 (Skylake and newer).
> +
> disable_ddw [PPC/PSERIES]
> Disable Dynamic DMA Window support. Use this if
> to workaround buggy firmware.
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index a7d7759..fdd2f99 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -1995,6 +1995,18 @@ static void intel_pmu_nhm_enable_all(int added)
> intel_pmu_enable_all(added);
> }
>
> +static void enable_counter_freeze(void)
> +{
> + update_debugctlmsr(get_debugctlmsr() |
> + DEBUGCTLMSR_FREEZE_PERFMON_ON_PMI);
> +}
> +
> +static void disable_counter_freeze(void)
> +{
> + update_debugctlmsr(get_debugctlmsr() &
> + ~DEBUGCTLMSR_FREEZE_PERFMON_ON_PMI);
> +}
> +
> static inline u64 intel_pmu_get_status(void)
> {
> u64 status;
> @@ -2290,6 +2302,91 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
> return handled;
> }
>
> +static bool disable_counter_freezing;
> +static int __init intel_perf_counter_freezing_setup(char *s)
> +{
> + disable_counter_freezing = true;
> + pr_info("Intel PMU Counter freezing feature disabled\n");
> + return 1;
> +}
> +__setup("disable_counter_freezing", intel_perf_counter_freezing_setup);
> +
> +/*
> + * Simplified handler for Arch Perfmon v4:
> + * - We rely on counter freezing/unfreezing to enable/disable the PMU.
> + * This is done automatically on PMU ack.
> + * - Ack the PMU only after the APIC.
> + */
> +
> +static int intel_pmu_handle_irq_v4(struct pt_regs *regs)
> +{
> + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
> + int handled = 0;
> + bool bts = false;
> + u64 status;
> + int pmu_enabled = cpuc->enabled;
> + int loops = 0;
> +
> + /* PMU has been disabled because of counter freezing */
> + cpuc->enabled = 0;
> + if (test_bit(INTEL_PMC_IDX_FIXED_BTS, cpuc->active_mask)) {
> + bts = true;
> + intel_bts_disable_local();
> + handled = intel_pmu_drain_bts_buffer();
> + handled += intel_bts_interrupt();
> + }
> + status = intel_pmu_get_status();
> + if (!status)
> + goto done;
> +again:
> + intel_pmu_lbr_read();
> + if (++loops > 100) {
> + static bool warned;
> +
> + if (!warned) {
> + WARN(1, "perfevents: irq loop stuck!\n");
> + perf_event_print_debug();
> + warned = true;
> + }
> + intel_pmu_reset();
> + goto done;
> + }
> +
> +
> + handled += handle_pmi_common(regs, status);
> +done:
> + /* Ack the PMI in the APIC */
> + apic_write(APIC_LVTPC, APIC_DM_NMI);
> +
> + /*
> + * The counters start counting immediately while ack the status.
> + * Make it as close as possible to IRET. This avoids bogus
> + * freezing on Skylake CPUs.
> + */
> + if (status) {
> + intel_pmu_ack_status(status);
> + } else {
> + /*
> + * CPU may issues two PMIs very close to each other.
> + * When the PMI handler services the first one, the
> + * GLOBAL_STATUS is already updated to reflect both.
> + * When it IRETs, the second PMI is immediately
> + * handled and it sees clear status. At the meantime,
> + * there may be a third PMI, because the freezing bit
> + * isn't set since the ack in first PMI handlers.
> + * Double check if there is more work to be done.
> + */
> + status = intel_pmu_get_status();
> + if (status)
> + goto again;
> + }
> +
> + if (bts)
> + intel_bts_enable_local();
> + cpuc->enabled = pmu_enabled;
> + return handled;
> +}
> +
> /*
> * This handler is triggered by the local APIC, so the APIC IRQ handling
> * rules apply:
> @@ -3361,6 +3458,9 @@ static void intel_pmu_cpu_starting(int cpu)
> if (x86_pmu.version > 1)
> flip_smm_bit(&x86_pmu.attr_freeze_on_smi);
>
> + if (x86_pmu.counter_freezing)
> + enable_counter_freeze();
> +
> if (!cpuc->shared_regs)
> return;
>
> @@ -3432,6 +3532,9 @@ static void intel_pmu_cpu_dying(int cpu)
> free_excl_cntrs(cpu);
>
> fini_debug_store_on_cpu(cpu);
> +
> + if (x86_pmu.counter_freezing)
> + disable_counter_freeze();
> }
>
> static void intel_pmu_sched_task(struct perf_event_context *ctx,
> @@ -4325,6 +4428,8 @@ __init int intel_pmu_init(void)
> x86_pmu.extra_regs = intel_skl_extra_regs;
> x86_pmu.pebs_aliases = intel_pebs_aliases_skl;
> x86_pmu.pebs_prec_dist = true;
> + x86_pmu.counter_freezing = disable_counter_freezing ?
> + false : true;
> /* all extra regs are per-cpu when HT is on */
> x86_pmu.flags |= PMU_FL_HAS_RSP_1;
> x86_pmu.flags |= PMU_FL_NO_HT_SHARING;
> @@ -4442,6 +4547,13 @@ __init int intel_pmu_init(void)
> pr_cont("full-width counters, ");
> }
>
> + /*
> + * For arch perfmon 4 use counter freezing to avoid
> + * several MSR accesses in the PMI.
> + */
> + if (x86_pmu.counter_freezing)
> + x86_pmu.handle_irq = intel_pmu_handle_irq_v4;
> +
> kfree(to_free);
> return 0;
> }
> diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
> index 1562863..adae087 100644
> --- a/arch/x86/events/perf_event.h
> +++ b/arch/x86/events/perf_event.h
> @@ -560,9 +560,11 @@ struct x86_pmu {
> struct event_constraint *event_constraints;
> struct x86_pmu_quirk *quirks;
> int perfctr_second_write;
> - bool late_ack;
> u64 (*limit_period)(struct perf_event *event, u64 l);
>
> + /* PMI handler bits */
> + unsigned int late_ack :1,
> + counter_freezing :1;
> /*
> * sysfs attrs
> */
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index 68b2c31..4ae4a59 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -157,6 +157,7 @@
> #define DEBUGCTLMSR_BTS_OFF_OS (1UL << 9)
> #define DEBUGCTLMSR_BTS_OFF_USR (1UL << 10)
> #define DEBUGCTLMSR_FREEZE_LBRS_ON_PMI (1UL << 11)
> +#define DEBUGCTLMSR_FREEZE_PERFMON_ON_PMI (1UL << 12)
> #define DEBUGCTLMSR_FREEZE_IN_SMM_BIT 14
> #define DEBUGCTLMSR_FREEZE_IN_SMM (1UL << DEBUGCTLMSR_FREEZE_IN_SMM_BIT)
>
>
Powered by blists - more mailing lists