[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5a4ab06e-8628-4e1d-addb-2af920deffad@linux.intel.com>
Date: Wed, 18 Dec 2024 09:52:03 -0500
From: "Liang, Kan" <kan.liang@...ux.intel.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: mingo@...hat.com, acme@...nel.org, namhyung@...nel.org,
irogers@...gle.com, linux-kernel@...r.kernel.org,
linux-perf-users@...r.kernel.org, ak@...ux.intel.com, eranian@...gle.com
Subject: Re: [PATCH V5 4/4] perf/x86/intel: Support PEBS counters snapshotting
On 2024-12-18 3:24 a.m., Peter Zijlstra wrote:
> On Tue, Dec 17, 2024 at 12:45:56PM -0500, Liang, Kan wrote:
>>
>>
>> On 2024-12-17 8:37 a.m., Peter Zijlstra wrote:
>>> On Mon, Dec 16, 2024 at 12:45:05PM -0800, kan.liang@...ux.intel.com wrote:
>>>
>>>> @@ -2049,6 +2102,40 @@ static void setup_pebs_adaptive_sample_data(struct perf_event *event,
>>>> }
>>>> }
>>>>
>>>> + if (format_group & (PEBS_DATACFG_CNTR | PEBS_DATACFG_METRICS)) {
>>>> + struct pebs_cntr_header *cntr = next_record;
>>>> + int bit;
>>>> +
>>>> + next_record += sizeof(struct pebs_cntr_header);
>>>> +
>>>> + for_each_set_bit(bit, (unsigned long *)&cntr->cntr, INTEL_PMC_MAX_GENERIC) {
>>>> + x86_perf_event_update(cpuc->events[bit], (u64 *)next_record);
>>>> + next_record += sizeof(u64);
>>>> + }
>>>
>>> I still don't much like any of this -- the next_record value might be in
>>> the past relative to what is already in the event.
>>>
>>
>> I forgot to handle the read case. With the below patch, the next_record
>> value should not contain a previous value, unless there is a HW bug.
>> Because no matter it's read/pmi/context switch, perf always stop the
>> PMU, drains the buffer, and the value is always from the PEBS record.
>
> Consider this 3 counter case:
>
> A is our regular counter
> B is a PEBS event which reads A
> C is a PEBS event
>
> For convencience, let A count the lines in our example (anything
> incrementing at a faster rate will get the same result after all).
>
> Then, afaict, the following can happen:
>
> B-assist A=1
> C A=2
> B-assist A=3
> A-overflow-PMI A=4
> C-assist-PMI (DS buffer) A=5
>
> So the A-PMI will update A->count to 4.
> Then the C-PMI will process the DS buffer, which will:
>
> - adjust A->count to 1
> - adjust A->count to 3
The patch should have addressed the case. I will update the changelog to
point to the case in the next version.
The below is the code to address the issue in this patch.
Perf will drain the PEBS buffer before handling the non-PEBS overflow.
So the A-PMI will
- Process the DS. adjust A->count to 3
- adjust A->count to 4
The C-PMI will no touch the A->count since there is no record from B.
@@ -3115,6 +3115,14 @@ static int handle_pmi_common(struct pt_regs
*regs, u64 status)
if (!test_bit(bit, cpuc->active_mask))
continue;
+ /*
+ * There may be unprocessed PEBS records in the PEBS buffer,
+ * which still stores the previous values.
+ * Process those records first before handling the latest value.
+ */
+ if (is_pebs_counter_event(event))
+ x86_pmu.drain_pebs(regs, &data);
+
if (!intel_pmu_save_and_restart(event))
continue;
Thanks,
Kan
>
> Leaving us, in the end, with A going backwards.
>
> How is that not a problem?
>
> IIRC I've raised this point before, and your Changelog does not mention
> anything about this issue at all :-(
>
Powered by blists - more mailing lists