linux-kernel - Re: [PATCH v5 3/4] perf record: Skip don't fail for events that don't open

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z5qcd6upvrPOqayY@google.com>
Date: Wed, 29 Jan 2025 13:24:07 -0800
From: Namhyung Kim <namhyung@...nel.org>
To: Ian Rogers <irogers@...gle.com>
Cc: Arnaldo Carvalho de Melo <acme@...nel.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...hat.com>, Mark Rutland <mark.rutland@....com>,
	Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
	Jiri Olsa <jolsa@...nel.org>,
	Adrian Hunter <adrian.hunter@...el.com>,
	Kan Liang <kan.liang@...ux.intel.com>,
	James Clark <james.clark@...aro.org>, Ze Gao <zegao2021@...il.com>,
	Weilin Wang <weilin.wang@...el.com>,
	Dominique Martinet <asmadeus@...ewreck.org>,
	Jean-Philippe Romain <jean-philippe.romain@...s.st.com>,
	Junhao He <hejunhao3@...wei.com>, linux-perf-users@...r.kernel.org,
	linux-kernel@...r.kernel.org, bpf@...r.kernel.org,
	Aditya Bodkhe <Aditya.Bodkhe1@....com>, Leo Yan <leo.yan@....com>,
	Atish Patra <atishp@...osinc.com>
Subject: Re: [PATCH v5 3/4] perf record: Skip don't fail for events that
 don't open

Hi Ian,

Sorry for the delay.

On Wed, Jan 15, 2025 at 09:56:59AM -0800, Ian Rogers wrote:
> On Wed, Jan 15, 2025 at 9:31 AM Namhyung Kim <namhyung@...nel.org> wrote:
> >
> > On Mon, Jan 13, 2025 at 03:04:26PM -0800, Ian Rogers wrote:
> > > On Mon, Jan 13, 2025 at 12:51 PM Namhyung Kim <namhyung@...nel.org> wrote:
> > > >
> > > > Hi Ian,
> > > >
> > > > On Fri, Jan 10, 2025 at 01:33:57PM -0800, Ian Rogers wrote:
> > > > > On Fri, Jan 10, 2025 at 11:26 AM Namhyung Kim <namhyung@...nel.org> wrote:
> > > > > >
> > > > > > On Fri, Jan 10, 2025 at 08:42:02AM -0800, Ian Rogers wrote:
[...]
> > > > > > > A patch lowering the priority of error messages should be independent
> > > > > > > of the 4 changes here. I'd be happy if someone follows this series
> > > > > > > with a patch doing it.
> > > > > >
> > > > > > I think the error behavior is a part of this change.
> > > > >
> > > > > I disagree with it, so I think you need to address my comments.
> > > >
> > > > You are changing the error behavior by skipping failed events then the
> > > > relevant error messages should be handled properly in this patchset.
> > >
> > > I'm not sure what you are asking and I'm not sure why it matters?
> > > Previously you'd asked for all the output to be moved under verbose.
> > >
> > > If I specify an event that doesn't work with perf record today then it
> > > fails. With this patch it fails too. If that event is a core PMU event
> > > then there will be an error message for each core PMU that doesn't
> > > support the event. So I get 2 error messages on hybrid. This doesn't
> > > feel egregious or warrant a new error message mechanism. I would like
> > > it so that evsels supported 1 or more PMUs, in which case this would
> > > be 1 error message.
> > >
> > > If I specify perf record today on an uncore event then perf record
> > > fails and I get 1 error message for the uncore PMU. The new behavior
> > > will be to get 1 error message per uncore PMU. If I'm on a server with
> > > 10s of uncore PMUs then maybe the message is spammy, but the command
> > > fails today and will continue to fail with this series. I don't see a
> > > motivation to change or optimize for this case and again, evsels that
> > > support >1 PMU would be the most appropriate fix.
> > >
> > > The only case where there is no message today but would be with this
> > > patch series is for cycles on ARM's neoverse. There will be one
> > > warning for the evsel on the SLC PMU. That's one warning and not many.
> > >
> > > As I've said, if you want a more elaborate error reporting system then
> > > take these patches and add it to them. There's a larger refactor to
> > > make evsels support >1 PMU that would clean up the many events on
> > > server uncore PMUs issue, but that shouldn't be part of this series
> > > nor gate it. If you are trying to perf record on uncore PMUs then you
> > > already have problems and optimizing the error messages for your
> > > mistake, I don't get why it matters?
> >
> > What about with multiple events in the command line - one of them
> > failing with >1 PMUs and the command now succeeds?
> 
> So this would be something like:
> ```
> $ perf record -e cycles,instructions,data_read -a sleep 1
> ```
> where data_read is an uncore PMU event. The current behavior is:
> ```
> $ perf record -e cycles,instructions,data_read -a sleep 1
> Error:
> The sys_perf_event_open() syscall returned with 22 (Invalid argument)
> for event (data_read).
> "dmesg | grep -i perf" may provide additional information.
> ```
> The new behavior is:
> ```
> $ perf record -e cycles,instructions,data_read -a sleep 1
> Error:
> Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0'
> which will be removed.
> The sys_perf_event_open() syscall returned with 22 (Invalid argument)
> for event (data_read).
> "dmesg | grep -i perf" may provide additional information.
> 
> Error:
> Failure to open event 'data_read' on PMU 'uncore_imc_free_running_1'
> which will be removed.
> The sys_perf_event_open() syscall returned with 22 (Invalid argument)
> for event (data_read).
> "dmesg | grep -i perf" may provide additional information.
> 
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 3.138 MB perf.data (11670 samples) ]
> ```
> 
> We know nobody does this, as the command currently fails. It succeeds
> with this change, because that's the whole point of the change.

Well, I think it's because it failed before.  New users can come anytime
and do whatever they want (or can).  They might pass 100 failing events
with 1 successful event and it will give a ton of warnings with this.
So it'd be better ratelimit the message and make it optional (with -v).

But more importantly, I think we should agree on the patch 4 first.

Thanks,
Namhyung


> I'm not offended by seeing the event was being opened on >1 PMU. For the
> only currently succeeding situation where this will now warn, the
> cycles case on Neoverse because of the buggy event name in ARM's SLC
> PMU, there will be 1 warning. For my example the appropriate fix is to
> remove the data_read event. For the Neoverse case, specifying the PMU
> resolves the issue until ARM fixes their driver.