linux-kernel - Re: [PATCH v5 4/4] perf parse-events: Reapply "Prefer sysfs/JSON hardware events over legacy"

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAP-5=fWS8AzSo=vxcCFUaYMMth7FNMPNbCXjYOGApQ0AitqA2Q@mail.gmail.com>
Date: Wed, 15 Jan 2025 13:20:32 -0800
From: Ian Rogers <irogers@...gle.com>
To: Namhyung Kim <namhyung@...nel.org>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, 
	Arnaldo Carvalho de Melo <acme@...nel.org>, Mark Rutland <mark.rutland@....com>, 
	Alexander Shishkin <alexander.shishkin@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>, 
	Adrian Hunter <adrian.hunter@...el.com>, Kan Liang <kan.liang@...ux.intel.com>, 
	James Clark <james.clark@...aro.org>, Ze Gao <zegao2021@...il.com>, 
	Weilin Wang <weilin.wang@...el.com>, Dominique Martinet <asmadeus@...ewreck.org>, 
	Jean-Philippe Romain <jean-philippe.romain@...s.st.com>, Junhao He <hejunhao3@...wei.com>, 
	linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org, 
	bpf@...r.kernel.org, Aditya Bodkhe <Aditya.Bodkhe1@....com>, 
	Atish Patra <atishp@...osinc.com>, Leo Yan <leo.yan@....com>, 
	Beeman Strong <beeman@...osinc.com>, Arnaldo Carvalho de Melo <acme@...hat.com>
Subject: Re: [PATCH v5 4/4] perf parse-events: Reapply "Prefer sysfs/JSON
 hardware events over legacy"

On Wed, Jan 15, 2025 at 9:59 AM Namhyung Kim <namhyung@...nel.org> wrote:
>
> > On Mon, Jan 13, 2025 at 2:51 PM Ian Rogers <irogers@...gle.com> wrote:
> > > There was an explicit, and reviewed by Jiri and Arnaldo, intent with
> > > the hybrid work that using a legacy event with a hybrid PMU, even
> > > though the PMU doesn't advertise through json or sysfs the legacy
> > > event, the perf tool supports it.
>
> I thought legacy events on hybrid were converted to PMU events.

No, when BIG.little was created nothing changed in perf events but
when Intel did hybrid they wanted to make the hybrid CPUs (atom and
performance) appear as if they were one type. The PMU event encodings
vary a lot for this on Intel, ARM has standards for the encoding.
Intel extended the legacy format to take a PMU type id:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/include/uapi/linux/perf_event.h?h=perf-tools-next#n41
"EEEEEEEE: PMU type ID"
that is in the top 32-bits of the config.

> > >
> > > Making it so that events without PMUs are only legacy events just
> > > doesn't work. There are far too many existing uses of non-legacy
> > > events without PMU, the metrics contain 100s of examples.
>
> That's unfortunate.  It'd be nice if metrics were written with PMU
> names.

But then we'd end up with things like on Intel:
UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD
becoming:
uncore_cha/UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD/
or just:
cha/UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD/
As a user the first works for me and doesn't have any ambiguity over
PMUs as the event name already encodes the PMU. AMD similarly place
the part of a pipeline into event names. Were we to break everybody by
requiring the PMU we'd also need to explain which PMU to use. Sites
with event lists (like https://perfmon-events.intel.com/) don't
explain the PMU and it'd be messy as on Intel you have a CHA PMU for
server chips but a CBOX on client chips, etc.

> I have a question.  What if an event name in a metric matches to
> multiple unrelated PMUs?

The metric may break or we'd aggregate the unrelated counts together.
Take a metric like IPC as "instructions/cycles", that metric should
work on a hybrid system as they have instructions and cycles. If you
used an event for instructions like inst_retired.any then maybe the
metric will fail on one kind of core that didn't have that event. Now
if we have accelerators advertising instructions and cycles events, we
should be able to compute the metric for the accelerator. What could
happen today is that the accelerator will have a cpumask of a single
CPU, we could aggregate the accelerator counter into the CPU event
with the same CPU as the cpumask, we'd end up with a weird quasi CPU
and accelerator IPC metric for that CPU. What should happen is that we
get an IPC for the accelerator and IPC for each hybrid core
independently, but the way we handle evsels, CPUs, PMUs is not really
set up for that. Hopefully getting a set of PMUs into the evsel will
clear that up. Assuming all of that is cleared up, is it wrong if the
IPC metric is computed for the accelerator if it was originally
written as a CPU metric? Not really. Could there be metrics where that
is the case? Probably, and specifying PMUs in the event names would be
a fix. There have also been proposals that we restrict the PMUs for
certain metrics. As event names are currently so distinct it isn't a
problem we've faced yet and it is not clear it is a problem other than
highlighting tech debt in areas of the tool like aggregation.

> > >
> > > Prior to switching json/sysfs to being the priority when a PMU is
> > > specified, it was the case that all encodings were the same, with or
> > > without a PMU.
> > >
> > > I don't think there is anything natural about assuming things about
> > > event names. Take cycles, cpu-cycles and cpu_cycles:
> > >  - cycles on x86 is only encoded via a legacy event;
> > >  - cpu-cycles on Intel exists as a sysfs event, but cpu-cycles is also
> > > a legacy event name;
> > >  - cpu_cycles exists as a sysfs event on ARM but doesn't have a
> > > corresponding legacy event name.
>
> I think the behavior should be:
>
>   cycles -> PERF_COUNT_HW_CPU_CYCLES
>   cpu-cycles -> PERF_COUNT_HW_CPU_CYCLES
>   cpu_cycles -> no legacy -> sysfs or json
>   cpu/cycles/ -> sysfs or json
>   cpu/cpu-cycles/ -> sysfs or json

So I disagree as if you add a PMU to an event name the encoding
shouldn't change:
1) This historically was perf's behavior.
2) Different event encodings can have different behaviors (broken in
some notable cases).
3) Intuitively what wildcarding does is try to open "*/event/" where *
is every possible PMU name. Having different event encodings is
breaking that intuition it could also break situations where you try
to assert equivalence based on type/config.
4) The legacy encodings were (are?) broken on ARM Apple M? CPUs,
that's why the priority was changed.
5) RISC-V would like the tool tackle the legacy to config mapping
challenge, rather than the PMU driver, given the potential diversity
of hardware implementations.

To this end we hosted RISC-V's perf people at Google and they
expressed that their preference was what this series does, and they
expressed this directly to you.

I don't think there would be an issue in this area if it wasn't for
Neoverse and Linus - that's why the revert happened. This change in
behavior was proposed by Arnaldo:
https://lore.kernel.org/lkml/ZlY0F_lmB37g10OK@x1/
and has tags from Intel, ARM and Rivos (RISC-V). I intend to carry it
in Google's tree.

Thanks,
Ian