lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAP-5=fU3fk79xtNAGwk35PCkiii=QoBdhbzit1Ax9OsEtrPExg@mail.gmail.com>
Date: Thu, 8 May 2025 08:34:26 -0700
From: Ian Rogers <irogers@...gle.com>
To: Sandipan Das <sandipan.das@....com>
Cc: linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org, 
	Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, 
	Arnaldo Carvalho de Melo <acme@...nel.org>, Namhyung Kim <namhyung@...nel.org>, 
	Mark Rutland <mark.rutland@....com>, 
	Alexander Shishkin <alexander.shishkin@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>, 
	Adrian Hunter <adrian.hunter@...el.com>, Kan Liang <kan.liang@...ux.intel.com>, 
	Stephane Eranian <eranian@...gle.com>, Ravi Bangoria <ravi.bangoria@....com>, 
	Ananth Narayan <ananth.narayan@....com>
Subject: Re: [PATCH 0/3] perf vendor events amd: Address event errata

On Thu, May 8, 2025 at 3:56 AM Sandipan Das <sandipan.das@....com> wrote:
>
> On 5/7/2025 9:26 PM, Ian Rogers wrote:
> > On Wed, May 7, 2025 at 7:28 AM Sandipan Das <sandipan.das@....com> wrote:
> >>
> >> Remove unreliable Zen 5 events and metrics. The following errata from
> >> the Revision Guide for AMD Family 1Ah Models 00h-0Fh Processors have
> >> been addressed.
> >> #1569 PMCx078 Counts Incorrectly in Unpredictable Ways
> >> #1583 PMCx18E May Overcount Instruction Cache Accesses
> >> #1587 PMCx188 May Undercount IBS (Instruction Based Sampling) Fetch Events
> >>
> >> The document can be downloaded from
> >> https://bugzilla.kernel.org/attachment.cgi?id=308095
> >
> > Hi Sandipan,
> >
> > the document is somewhat brief, for example:
> > ```
> > 1583 PMCx18E May Overcount Instruction Cache Accesses
> >
> > Description
> > If PMCx18E[IcAccessTypes] is programmed to 18x (Instruction Cache
> > Miss) or 1Fx (All Instruction Cache Accesses) then the performance
> > counter may overcount.
> >
> > Potential Effect on System
> > Inaccuracies in performance monitoring software may be experienced.
> >
> > Suggested Workaround
> > None
> >
> > Fix Planned
> > No fix planned
> > ```
> > Given being able to count instruction cache accesses (for example) is
> > a useful feature, would it be possible to change:
> > ```
> > -  {
> > -    "EventName": "ic_tag_hit_miss.instruction_cache_hit",
> > -    "EventCode": "0x18e",
> > -    "BriefDescription": "Instruction cache hits.",
> > -    "UMask": "0x07"
> > -  },
> > ...
> > ```
> > to be say:
> > ```
> >   {
> >     "EventName": "ic_tag_hit_miss.instruction_cache_hit",
> >     "EventCode": "0x18e",
> >     "BriefDescription": "Instruction cache hits. Note, this counter is
> > affected by errata 1583.",
> >     "UMask": "0x07",
> >     "Experimental": "1"
> >   },
> > ```
> > That is rather than remove the event, the event is tagged as
> > experimental (taken to mean accuracy isn't guaranteed) and the errata
> > is explicitly noted in the description. Currently the Experimental tag
> > has no impact on what happens in the perf tool, for example, the
> > "Deprecated" tag hides events in the `perf list` command and is
> > commonly used when an event is renamed.
> >
>
> I agree that events like IC hits and misses are generally useful and am
> fine with the idea of keeping them but my concern is that unless users
> read the event description, there is no way for them to know if the
> perf output that they are seeing may be unreliable. There is also no
> guarantee that such events will be fixed in a future uarch. From a
> quick glance, I couldn't find a mechanism that makes perf stat/report
> show a warning for named events with known issues.

So I'm forgetting the flow, but rediscovering it. We do have an Errata
json value as shown in:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/arm64/ampere/ampereone/memory.json?h=perf-tools-next#n2
```
    {
       "ArchStdEvent": "LD_RETIRED",
       "Errata": "Errata AC03_CPU_52",
       "BriefDescription": "Instruction architecturally executed,
condition code check pass, load.
Impacted by errata -"
   },
```
It doesn't impact perf stat/record but it does get added to the event
description for perf list:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/jevents.py?h=perf-tools-next#n340
```
    if 'Errata' in jd:
      extra_desc += '  Spec update: ' + jd['Errata']
```
which means the perf list description ends up as "Instruction
architecturally executed, condition code check pass, load. Impacted by
errata -  Spec update: Errata AC03_CPU_52". We could change this so
that the Errata is distinct in the encoded in perf json and then we
could display the errata when perf stat/record parses the event. I'd
be a little worried about this breaking things that parse perf's text
output, but the impact would be limited to Zen5, Ampere and older
Intel CPUs. We could also make the errata output conditional on
passing a verbose flag to perf. Would just `perf list` support work
for you or would the perf stat/record changes be a requirement for
keeping these events?

Thanks,
Ian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ