[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAP-5=fVr43v8gkqi8SXVaNKnkO+cooQVqx3xUFJ-BtgxGHX90g@mail.gmail.com>
Date: Fri, 10 Jan 2025 11:18:53 -0800
From: Ian Rogers <irogers@...gle.com>
To: Namhyung Kim <namhyung@...nel.org>
Cc: James Clark <james.clark@...aro.org>, Leo Yan <leo.yan@....com>,
Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>, Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>,
Adrian Hunter <adrian.hunter@...el.com>, Kan Liang <kan.liang@...ux.intel.com>,
Ze Gao <zegao2021@...il.com>, Weilin Wang <weilin.wang@...el.com>,
Dominique Martinet <asmadeus@...ewreck.org>,
Jean-Philippe Romain <jean-philippe.romain@...s.st.com>, Junhao He <hejunhao3@...wei.com>,
linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org,
bpf@...r.kernel.org, Aditya Bodkhe <Aditya.Bodkhe1@....com>,
Atish Patra <atishp@...osinc.com>
Subject: Re: [PATCH v5 3/4] perf record: Skip don't fail for events that don't open
On Fri, Jan 10, 2025 at 10:55 AM Namhyung Kim <namhyung@...nel.org> wrote:
>
> On Thu, Jan 09, 2025 at 08:44:38PM -0800, Ian Rogers wrote:
> > On Thu, Jan 9, 2025 at 5:25 PM Namhyung Kim <namhyung@...nel.org> wrote:
> > >
> > > On Thu, Jan 09, 2025 at 02:21:08PM -0800, Ian Rogers wrote:
> > > > Whilst for many tools it is an expected behavior that failure to open
> > > > a perf event is a failure, ARM decided to name PMU events the same as
> > > > legacy events and then failed to rename such events on a server uncore
> > > > SLC PMU. As perf's default behavior when no PMU is specified is to
> > > > open the event on all PMUs that advertise/"have" the event, this
> > > > yielded failures when trying to make the priority of legacy and
> > > > sysfs/json events uniform - something requested by RISC-V and ARM. A
> > > > legacy event user on ARM hardware may find their event opened on an
> > > > uncore PMU which for perf record will fail. Arnaldo suggested skipping
> > > > such events which this patch implements. Rather than have the skipping
> > > > conditional on running on ARM, the skipping is done on all
> > > > architectures as such a fundamental behavioral difference could lead
> > > > to problems with tools built/depending on perf.
> > > >
> > > > An example of perf record failing to open events on x86 is:
> > > > ```
> > > > $ perf record -e data_read,cycles,LLC-prefetch-read -a sleep 0.1
> > > > Error:
> > > > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0' which will be removed.
> > > > The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> > > > "dmesg | grep -i perf" may provide additional information.
> > > >
> > > > Error:
> > > > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_1' which will be removed.
> > > > The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> > > > "dmesg | grep -i perf" may provide additional information.
> > > >
> > > > Error:
> > > > Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
> > > > The LLC-prefetch-read event is not supported.
> > > > [ perf record: Woken up 1 times to write data ]
> > > > [ perf record: Captured and wrote 2.188 MB perf.data (87 samples) ]
> > >
> > > I'm afraid this can be too noisy.
> >
> > The intention is to be noisy:
> > 1) it matches the existing behavior, anything else is potentially a regression;
>
> Well.. I think you're changing the behavior. :) Also currently it just
> fails on the first event so it won't be too much noisy.
>
> $ perf record -e data_read,data_write,LLC-prefetch-read -a sleep 0.1
> event syntax error: 'data_read,data_write,LLC-prefetch-read'
> \___ Bad event name
>
> Unable to find event on a PMU of 'data_read'
> Run 'perf list' for a list of valid events
>
> Usage: perf record [<options>] [<command>]
> or: perf record [<options>] -- <command> [<options>]
>
> -e, --event <event> event selector. use 'perf list' to list available events
Fwiw, this error is an event parsing error not an event opening error.
You need to select an uncore event, I was using data_read which exists
in the uncore_imc_free_running PMUs on Intel tigerlake. Here is the
existing error message:
```
$ perf record -e data_read -a true
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument)
for event (data_read).
"dmesg | grep -i perf" may provide additional information.
```
and here it with the series:
```
$ perf record -e data_read -a true
Error:
Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0'
which will be removed.
The sys_perf_event_open() syscall returned with 22 (Invalid argument)
for event (data_read).
"dmesg | grep -i perf" may provide additional information.
Error:
Failure to open event 'data_read' on PMU 'uncore_imc_free_running_1'
which will be removed.
The sys_perf_event_open() syscall returned with 22 (Invalid argument)
for event (data_read).
"dmesg | grep -i perf" may provide additional information.
Error:
Failure to open any events for recording.
```
and here is what it would be with pr_debug:
```
$ perf record -e data_read -a true
Error:
Failure to open any events for recording.
```
I believe this last output is worst because:
1) If not all events fail to open there is no error reported unless I
know to run with -v, which will also bring a bunch more noise with it,
2) I don't see the PMU / event name and "Invalid argument" indicating
what has gone wrong again unless I know to run with -v and get all the
verbose noise with that.
Yes it is noisy on 1 platform for 1 event due to an ARM PMU event name
bug that ARM should have long ago fixed. That should be fixed rather
than hiding errors and making users think they are recording samples
when silently they're not - or they need to search through verbose
output to try to find out if something broke.
> > 2) it only happens if trying to record on a PMU/event that doesn't
> > support recording, something that is currently an error and so we're
> > not motivated to change the behavior as no-one should be using it;
>
> It was caught by Linus, so we know at least one (very important) user.
If they care enough then specifying the PMU with the event will avoid
any warning and has always been a fix for this issue. It was the first
proposed workaround for Linus.
> > 3) for the wildcard case the only offender is ARM's SLC PMU and the
> > appropriate fix there has always been to make the CPU cycle's event
> > name match the bus_cycles event name by calling it cpu_cycles -
> > something that doesn't conflict with a core PMU event name, the thing
> > that has introduced all these problems, patches, long email exchanges,
> > unfixed inconsistencies, etc.. If the errors aren't noisy then there
> > is little motivation for the ARM SLC PMU's event name to be fixed.
>
> I understand your concern but I'm not sure it's the best way to fix the
> issue.
Right, I'm similarly concerned about hiding legitimate warning/error
messages because of 1 event on 1 PMU on 1 architecture because of how
perf gets driven by 1 user. Yes, when you break you can wade through
the verbose output but imo the verbose output was never intended to be
used in that way.
Thanks,
Ian
Powered by blists - more mailing lists