linux-kernel - Re: [Patch v2 10/24] perf/x86/intel: Process arch-PEBS records or record fragments

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <59321f8a-c804-4c0d-8941-3805c078d2c6@linux.intel.com>
Date: Thu, 27 Feb 2025 10:04:09 +0800
From: "Mi, Dapeng" <dapeng1.mi@...ux.intel.com>
To: "Liang, Kan" <kan.liang@...ux.intel.com>,
 Peter Zijlstra <peterz@...radead.org>
Cc: Ingo Molnar <mingo@...hat.com>, Arnaldo Carvalho de Melo
 <acme@...nel.org>, Namhyung Kim <namhyung@...nel.org>,
 Ian Rogers <irogers@...gle.com>, Adrian Hunter <adrian.hunter@...el.com>,
 Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
 Andi Kleen <ak@...ux.intel.com>, Eranian Stephane <eranian@...gle.com>,
 linux-kernel@...r.kernel.org, linux-perf-users@...r.kernel.org,
 Dapeng Mi <dapeng1.mi@...el.com>
Subject: Re: [Patch v2 10/24] perf/x86/intel: Process arch-PEBS records or
 record fragments


On 2/26/2025 11:45 PM, Liang, Kan wrote:
>
> On 2025-02-26 4:35 a.m., Peter Zijlstra wrote:
>> On Wed, Feb 26, 2025 at 01:20:37PM +0800, Mi, Dapeng wrote:
>>
>>>> Also, should that workaround have been extended to also include
>>>> GLOBAL_STATUS_PERF_METRICS_OVF in that mask, or was that defect fixed
>>>> for every chip capable of metrics stuff?
>>> hmm,  per my understanding, GLOBAL_STATUS_PERF_METRICS_OVF handling should
>>> only be skipped when fixed counter 3 or perf metrics are included in PEBS
>>> counter group. In this case, the slots and topdown metrics have been
>>> updated by PEBS handler. It should not be processed again.
>>>
>>> @Kan Liang, is it correct?
>> Right, so the thing is, *any* PEBS event pending will clear METRICS_OVF
>> per:
>>
>>                 status &= x86_pmu.intel_ctrl | GLOBAL_STATUS_TRACE_TOPAPMI;
>>
> Yes, we have to add it for both legacy PEBS and ARCH PEBS.
>
> An alternative way may change the order of handling the overflow bit.
>
> The commit daa864b8f8e3 ("perf/x86/pebs: Fix handling of PEBS buffer
> overflows") has moved the "status &= ~cpuc->pebs_enabled;" out of PEBS
> overflow code.
>
> As long as the PEBS overflow is handled after PT, I don't think the
> above is required anymore.
>
> It should be similar to METRICS_OVF. But the PEBS counters snapshotting
> should be specially handled, since the PEBS will handle the metrics
> counter as well.
>
> @@ -3211,7 +3211,8 @@ static int handle_pmi_common(struct pt_regs *regs,
> u64 status)
>  	/*
>  	 * Intel Perf metrics
>  	 */
> -	if (__test_and_clear_bit(GLOBAL_STATUS_PERF_METRICS_OVF_BIT, (unsigned
> long *)&status)) {
> +	if (__test_and_clear_bit(GLOBAL_STATUS_PERF_METRICS_OVF_BIT, (unsigned
> long *)&status) &&
> +	
> !is_pebs_counter_event_group(cpuc->events[INTEL_PMC_IDX_FIXED_SLOTS])) {
>  		handled++;
>  		static_call(intel_pmu_update_topdown_event)(NULL, NULL);
>  	}

Yes, we still need to handle METRICS_OVF if fixed counter 3 and metrics are
not included into counter group. It ensure the metrics count can be updated
timely once PERF_METRICS MSR overflows.

Since there were more and more bits added into  GLOBAL_CTRL_STAT MSR in
past several years, it becomes not correct to execute the below code in
BUFFER_OVF_BIT handling code.

    status &= intel_ctrl | GLOBAL_STATUS_TRACE_TOPAPMI;

It unconditionally clears METRICS_OVF and other bits which could be added
in the future. This is incorrect and could introduce potential issues.

Combining Kan's change, I think we can change the code like this.

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 8ef5b9a05fcc..0cf0f95b1af4 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3192,7 +3192,6 @@ static int handle_pmi_common(struct pt_regs *regs,
u64 status)
        struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
        int bit;
        int handled = 0;
-       u64 intel_ctrl = hybrid(cpuc->pmu, intel_ctrl);

        inc_irq_stat(apic_perf_irqs);

@@ -3236,7 +3235,6 @@ static int handle_pmi_common(struct pt_regs *regs,
u64 status)
                handled++;
                x86_pmu_handle_guest_pebs(regs, &data);
                static_call(x86_pmu_drain_pebs)(regs, &data);
-               status &= intel_ctrl | GLOBAL_STATUS_TRACE_TOPAPMI;

                /*
                 * PMI throttle may be triggered, which stops the PEBS event.
@@ -3269,12 +3267,17 @@ static int handle_pmi_common(struct pt_regs *regs,
u64 status)

        /*
         * Intel Perf metrics
+        * If PEBS counter group includes fix counter 3, PEBS handler would
update
+        * topdown events which is more accurate, it's unnecessary to
update again.
         */
-       if (__test_and_clear_bit(GLOBAL_STATUS_PERF_METRICS_OVF_BIT,
(unsigned long *)&status)) {
+       if (__test_and_clear_bit(GLOBAL_STATUS_PERF_METRICS_OVF_BIT,
(unsigned long *)&status) &&
+          
!is_pebs_counter_event_group(cpuc->events[INTEL_PMC_IDX_FIXED_SLOTS])) {
                handled++;
                static_call(intel_pmu_update_topdown_event)(NULL, NULL);
        }

+       status &= hybrid(cpuc->pmu, intel_ctrl);
+
        /*
         * Checkpointed counters can lead to 'spurious' PMIs because the
         * rollback caused by the PMI will have cleared the overflow status


>
> Thanks,
> Kan
>
>