[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9a76fcc7-a8e0-4b88-b93c-7dbf65bc695e@amd.com>
Date: Thu, 8 May 2025 16:26:37 +0530
From: Sandipan Das <sandipan.das@....com>
To: Ian Rogers <irogers@...gle.com>
Cc: linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org,
Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Namhyung Kim <namhyung@...nel.org>, Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...nel.org>, Adrian Hunter <adrian.hunter@...el.com>,
Kan Liang <kan.liang@...ux.intel.com>, Stephane Eranian
<eranian@...gle.com>, Ravi Bangoria <ravi.bangoria@....com>,
Ananth Narayan <ananth.narayan@....com>
Subject: Re: [PATCH 0/3] perf vendor events amd: Address event errata
On 5/7/2025 9:26 PM, Ian Rogers wrote:
> On Wed, May 7, 2025 at 7:28 AM Sandipan Das <sandipan.das@....com> wrote:
>>
>> Remove unreliable Zen 5 events and metrics. The following errata from
>> the Revision Guide for AMD Family 1Ah Models 00h-0Fh Processors have
>> been addressed.
>> #1569 PMCx078 Counts Incorrectly in Unpredictable Ways
>> #1583 PMCx18E May Overcount Instruction Cache Accesses
>> #1587 PMCx188 May Undercount IBS (Instruction Based Sampling) Fetch Events
>>
>> The document can be downloaded from
>> https://bugzilla.kernel.org/attachment.cgi?id=308095
>
> Hi Sandipan,
>
> the document is somewhat brief, for example:
> ```
> 1583 PMCx18E May Overcount Instruction Cache Accesses
>
> Description
> If PMCx18E[IcAccessTypes] is programmed to 18x (Instruction Cache
> Miss) or 1Fx (All Instruction Cache Accesses) then the performance
> counter may overcount.
>
> Potential Effect on System
> Inaccuracies in performance monitoring software may be experienced.
>
> Suggested Workaround
> None
>
> Fix Planned
> No fix planned
> ```
> Given being able to count instruction cache accesses (for example) is
> a useful feature, would it be possible to change:
> ```
> - {
> - "EventName": "ic_tag_hit_miss.instruction_cache_hit",
> - "EventCode": "0x18e",
> - "BriefDescription": "Instruction cache hits.",
> - "UMask": "0x07"
> - },
> ...
> ```
> to be say:
> ```
> {
> "EventName": "ic_tag_hit_miss.instruction_cache_hit",
> "EventCode": "0x18e",
> "BriefDescription": "Instruction cache hits. Note, this counter is
> affected by errata 1583.",
> "UMask": "0x07",
> "Experimental": "1"
> },
> ```
> That is rather than remove the event, the event is tagged as
> experimental (taken to mean accuracy isn't guaranteed) and the errata
> is explicitly noted in the description. Currently the Experimental tag
> has no impact on what happens in the perf tool, for example, the
> "Deprecated" tag hides events in the `perf list` command and is
> commonly used when an event is renamed.
>
I agree that events like IC hits and misses are generally useful and am
fine with the idea of keeping them but my concern is that unless users
read the event description, there is no way for them to know if the
perf output that they are seeing may be unreliable. There is also no
guarantee that such events will be fixed in a future uarch. From a
quick glance, I couldn't find a mechanism that makes perf stat/report
show a warning for named events with known issues.
Powered by blists - more mailing lists